A quick follow-up.

Looking at other customers running 18.12.1 who reported problems at the exact 
same time with AWS issue described below.

We are seeing similar behavior.
For these systems, the third STUN failure occurs.  We were able to answer the 
call because the SIP provider didn't CANCEL the call.
However, upstream from the service provider the calls were terminated.
Resulting in a call from the SIP provider to Asterisk that's live, but there is 
no caller so it appears to be dead air.

Does the res_rtp_asterisk stunaddr DNS TTL expiration mentioned in change ID 
I7955a046293f913ba121bbd82153b04439e3465f require the dnsmgr.conf to be enabled?

Dan


From: Dan Cropp
Sent: Monday, February 6, 2023 2:06 PM
To: Asterisk Users Mailing List - Non-Commercial Discussion 
<asterisk-users@lists.digium.com>
Subject: Asterisk rtp.conf stunaddr setting - what happens if there is an outage

Over the weekend, we had several customers running at AWS.  AWS had an outage 
during this time.

This customer is running Asterisk 16.23.0 (which has the STUN timeout crash 
fix).
>From what I have been told, other customers are running newer Asterisk 18.12.1 
>but encountered similar issues.  (I haven't had a chance to verify this)
All these customers should be running PJSIP, but I haven't had a chance to 
verify.


The logs show Asterisk was reporting problems communicating with the STUN 
address in the rtp.conf

[02/04 00:15:03.812] NOTICE[5943] stun.c: Attempt 1 to send STUN request to 
'x.x.x.x' timed out.
[02/04 00:15:06.812] NOTICE[5943] stun.c: Attempt 2 to send STUN request to 
''x.x.x.x ' timed out.
[02/04 00:15:09.813] WARNING[5943] stun.c: Attempt 3 to send STUN request to 
'x.x.x.x' timed out. Check that the server address is correct and reachable.

Until Asterisk was reset, the same pattern kept happening.

Asterisk received INVITEs
Immediately sends the 100 Trying
7 seconds later, Asterisk receives a CANCEL from the SIP provider.
Another half second later, Asterisk receives a second CANCEL
A second later, Asterisk receives a third CANCEL
After the third failed to send STUN request, Asterisk sends a 200 OK response 
for the CSeq CANCEL
Followed by a 487 Request Terminated
Then a second 200 OK response for the CANCEL CSeq
Then a third 200 OK response for the CANCEL CSeq

We have an AMI connection.  At this point, we are seeing the Newchannel event 
for this channel.
It immediately sends various events for the Channel, including the Event: 
Hangup indicating the channel is ended.

63 ms later, it receives an ACK which completes the Call-ID processing.


This went on for over 8 hours.
When they restarted the Asterisk box, everything was fine.  I have been told, 
they had to restart each Asterisk we had running at AWS to resolve the failed 
to send to STUN error.  No calls/channels would work until that was resolved.

I wonder if the STUN address lookup happens only one time and AWS DNS may have 
modified something during this outage/recovery?
Is there a recommendation on how to prevent this from happening?
Any thoughts?


Dan

-- 
_____________________________________________________________________
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

Check out the new Asterisk community forum at: https://community.asterisk.org/

New to Asterisk? Start here:
      https://wiki.asterisk.org/wiki/display/AST/Getting+Started

asterisk-users mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-users

Reply via email to