On 2/17/21 8:45 PM, sdut...@wazo.io wrote:
<snip>
I've found a related Asterisk issue with similar symptoms, but a
different cause (WAN unavailable):

https://issues.asterisk.org/jira/browse/ASTERISK-22745

I'm willing to propose a patch in Asterisk to avoid the delay when the
STUN server changes its IP address. I'm wondering what is the best
strategy to make Asterisk resolve the stunaddr again. Here are the
solutions that I've come up with:

1. Resolve the stunaddr hostname at every call

This is the strategy used for turnaddr.
This adds another chance of timeout when placing a call, e.g. if the DNS
resolver is unavailable.
This also adds a delay for every call, i.e. the time for the stunaddr to
be resolved to an IP address.

2. Keep the stunaddr cache in memory, and refresh it after the first timeout

This strategy is used in res_stun_monitor.

3. Keep the stunaddr cache in memory, and refresh it periodically

What would be an acceptable default refresh frequency?

4. Keep the stunaddr cache in memory, and refresh it after the DNS
response TTL

AFAIK, this requires making an explicit DNS query, instead of relying on
the OS name-resolving facilites like getaddrinfo. Maybe
res_resolver_unbound could be used there? Is it a good idea to add a
dependency from res_rtp_asterisk to res_resolver_unbound? Make the
dependency optional with a configuration flag e.g.
"stunaddr_resolve_frequency=auto" (default="once")?

5. Some program (either an Asterisk module thread or some external
process) continuously checks the IP address of the STUN server and runs
  "module reload res_rtp_asterisk.so" when the IP address changes.

This is more of a crutch than a real solution.

My preference goes to solution 4, and if not possible, then solution 2.

My questions, then:

Do you know of any discussion about this topic?
What are your preferences regarding a solution?
Do you have better strategies to propose?
Does solution 4 go in the right direction?
Would it be better to have the same strategy for stunaddr and turnaddr
(currently solution 1)?

Not entirely related but I made patch a while back for chan_sip in order to solve a DNS issue with qualify (NOTIFY). In our case, some customers had a flaky LAN/WAN connection which was causing DNS failure and subsequently made Asterisk believe peers were offline. I recall DNS resolution in chan_sip being performed on reload only, exacerbating the problem. The patch I wrote made it so that chan_sip would attempt DNS resolution "later" for any unresolved addresses, skipping qualify until resolved. As qualify is performed regularly, any DNS problems eventually solved themselves and in doing so made for happy customers. The tricky part was getting all the locking and ref-counting right as the peer address and peer itself are used for lots of things other than qualify.

Considering this pattern, I would add an option 6 (similar to 5):

Instead of dead STUN servers being discovered at call setup, one could periodically check to see if the server is alive (STUN qualify?) and refresh the address as needed. The 10-second delay you are seeing would still be unavoidable if/when the STUN server dies just prior to call setup, but at least the the probability of such delays will be reduced due to early discovery. Respecting TTL would be nice to have. Not performing checks for recently-used STUN servers to reduce network spam would also be nice to have. In addition to above, reducing the timeout to 2-3 seconds and failing the call on timeout might be a better caller experience. By the time the caller redials, network issues will have resolved themselves.

--
Dennis Buteyn
Xorcom Ltd


--
_____________________________________________________________________
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
  http://lists.digium.com/mailman/listinfo/asterisk-dev

Reply via email to