On 05/06/2014 10:56 AM, Olle E. Johansson wrote:
On 06 May 2014, at 16:11, Daniel McFarlane <[email protected]> wrote:
Hi All,
I've been working with Asterisk for 2.5 years now but I am new to the mailing
list. Hoping to get some answers here..
I've written a piece of software to control Asterisk via AMI. I'm able to
login, making outbound calls, receive/record calls, handle digits, put people
into conferences, etc. Everything seems to work well, at a low to medium call
volume.
I've been stress testing the system and found what seems to be a memory issue
within Asterisk. However, after over a day of searching I can't even find any
bug reports or anything to bring me closer to a solution..but still hoping to
find a solution or that someone can help me find one..
I connect to the AMI successfully. In my test I am making an outbound call through a sip
"device" I configured within sip.conf. This sip "device" basically connects
back into the same Asterisk I am using to generate the calls.
If I generate outbound calls (i.e.: Via Action: Originate) using a low call
volume everything seems to be fine. e.g.: I can generate 10 calls and they all
go through and complete successfully (with a total of 20 calls bring processed
by the same Asterisk, due to both outbound and inbound calls).
The problem arises If I bombard (i.e.: Write all the Originate requests within about 7
seconds) the AMI to generate 97 outbound calls (for a total of 194 channels). The first
Originate commands seem to get processed, but then all of a sudden (after Asterisk seems
to have started processing a good amount of my Originate requests already) Asterisk seems
to reset the AMI interface! Without even having a dropped connection I receive a new
"Asterisk Call Manager/1.3" string and then previous commands that I issued
start coming back with Response: Error and Message: Permission denied.
I've enabled debugging and here's the security log output when I connect:
[Apr 1 12:43:25] SECURITY[9877] res_security_log.c:
SecurityEvent="SuccessfulAuth",EventTV="1396370605-551714",Severity="Informational",Service="AMI",EventVersion="1",AccountID="username",SessionID="0x7f0c540d96b8",LocalAddress="IPV4/TCP/0.0.0.0/5038",RemoteAddress="IPV4/TCP/192.168.1.62/52139",UsingPassword="0",SessionTV="1396370605-551710"
(This is the time when I am issuing commands, it is accepting and processing
them)
Then all of a sudden (when I receive a new "Asterisk Call Manager/1.3" string and
commands are rejected with "Permission denied") here is what the log shows:
[Apr 1 12:44:36] SECURITY[9877] res_security_log.c:
SecurityEvent="RequestNotAllowed",EventTV="1396370676-876795",Severity="Error",Service="AMI",EventVersion="1",AccountID="",SessionID="0x7f0c54072968",LocalAddress="IPV4/TCP/0.0.0.0/5038",RemoteAddress="IPV4/TCP/192.168.1.62/52144",RequestType="Action:
Originate",SessionTV="0-0"
(Note: How the AccountID is now empty)
At this same time the CLI shows:
== Manager '*****' logged off from 192.168.1.**
..yet it wasn't my application that disconnected and/or issued an Action:
logoff to Asterisk (my application doesn't even detect a connection drop, so it
seems like Asterisk just logs the manager user out!).
Why did Asterisk all of a sudden reset itself and/or lose reference to the user
I am authenticating with?
Note: My maxcalls setting within asterisk.conf was set to default. I even tried
upping it to 200 and that didn't help. My maxload setting was
untouched/commented out and therefore set to default (no threshold). I still
tried upping it to 6.0, 8.0 and 10.0 to no avail. I even uncommented
minmemfree and set it to 256 (although calling free -m from a terminal showed
their to always be plenty of memory on my system).
Does anyone have an idea how to fix/correct this?
One thing you have to be aware of is that Action:s are atomic. You have to wait
for the response of one action before you send the next one. Otherwise they
will all end up in the TCP buffer. That's propably what you see - a lot of
queued up messages being red after the tcp connection is reset.
That doesn't explain why something resets and send you a new connect message
and possibly requires a new login. That seems like a bug that needs to be
reported. Can you easily repeat this?
Just my 5 öre. Anyone else with good comments on this?
/O
Thank you for the quick responses.
First off, as Tony pointed out I did forget to include my Asterisk
version. I did realize it right after sending the first message, but
figured I'd send it in my first response. Sorry about that. I'm running
Asterisk 11.2.1.
Olle: When associating with 1 audio channel I do wait for the response
before issuing the next command. In this case, however, my software
internally treats each as a separate call (i.e.: As if I had issued
Originate requests to separate DAHDI trunks, instead of a IAX/SIP `pipe`
that handles all the SIP/IAX calls) so I do actually send multiple
back-to-back Action: Originate commands without waiting for a response
(but still wait for the OriginateResponse before issuing more commands
for that same call, as necessary given that I need the specific callno
from Asterisk before I can issue more commands). I assumed that
Asterisk would create a separate IAX/SIP entity/call for each Originate
response (which it seems to do well under low volume, e.g.: 10 calls) &
that this would therefore be ok.
Also, after for instance sending 6 Originate requests Asterisk does come
back with:
e.g.:
Response: Success
ActionID: 819367539
Message: Originate successfully queued
..for each command I issued. Is this not ok? I can see how waiting to
issue a response prior to to issuing another Action: Originate command
would prevent a lot of TCP commands from being queued in the TCP buffer,
but still seems like Asterisk should be handling it ok. Well, unless I
overload it's queue or something..
You are right about the commands all ending up in the TCP buffer. I
also concluded that the "Permission denied" messages were the result of
Asterisk trying to process from the TCP buffer, after it performed it's
reset.
Yes, I can fairly easily reproduce this. Here are some details to
understand on how to reproduce:
1) I have noticed that if I say issue 100 calls in blocks of 10 calls
every 2 seconds that Asterisk seems to be able to handle it..just not 97
calls at one time.
2) Strangely, here's one I can't explain. If I've been connected via
the AMI port for some time and then start my tests it seems to be ok.
If I connect, perform a test in blocks (as in #1 above) (kind of like
warming it up) and then bombard with 97 calls at a time, it seems to
hold up.
3) The easiest way for me to reproduce it is by disconnecting my
software and reconnecting just prior to bombarding with 97 calls. Now
I'm pulling at strings to explain this, but is it possible that Asterisk
needs some set up time after login or treats this as a kind of spam
attack? Perhaps that could explain why it resets the connection? I'd
have to do more testing to see if I can get it to reset after being
connected for a while..
Tony: I am certain it isn't my software performing or detecting a
disconnect as I have logs to show such errors, however, I will perform a
tcpdump as you suggested and let you know my results/answer the rest of
your questions in a further email, once I have completed that test..
Thank You
--
_____________________________________________________________________
-- Bandwidth and Colocation Provided by http://www.api-digital.com --
asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
http://lists.digium.com/mailman/listinfo/asterisk-dev