Re: [Dnsmasq-discuss] following RFC6106 triggers bug in network-manager

Simon Kelley Thu, 07 Nov 2013 03:28:01 -0800

The determination of these lifetimes was changed in dnsmasq-2.67 to bebased on the preferred lifetime of the prefix whose local address isbeing advertised, which is normally the prefix with the longestpreferred lifetime on the interface. That seems to me to be a moresensible metric, and should address this problem.



Cheers,

Simon.


On 05/11/13 18:05, Dan Williams wrote:

On Tue, 2013-11-05 at 09:21 +0100, Gui Iribarren wrote:

Hello,
so, we started suffering frequent, periodic disconnects on clients since
upgrading dnsmasq 2.62 ->  2.66

tracking down the issue, it came down to a network-manager bug while
maintaining the RDNSS list, where an unhandled expiring RDNSS lifetime
results in a full reconnection


We've fixed a number of NM bugs in this area, specifically (a) adding
some elasticity before deciding the DNS servers have expired, and (b)
sending Router Solicitations before they have expired, to get updated
lifetimes.

problem is, the kernel only understands the *router* lifetime, but
ignores everything about the RDNSS lifetime; and if the latter is
shorter than the former, then the RDNSS expires before the kernel sends
a RS to handle the *router* expiring lifetime.


RFC6106 says the lifetime SHOULD be bounded by "MaxRtrAdvInterval<=
Lifetime<= 2*MaxRtrAdvInterval", which allows at least one dropped RA,
and which NM should compensate for with the above mentioned fixes.

As you say below, the bug was fixed in NM 0.9.6 (released 2012-08-07, so
over a year ago) and I'd recommend that the distro just upgrade to get
the fixes instead of hacking around the issue in dnsmasq, which is
following the RFC.  It was clearly a NetworkManager bug.

Dan

in dnsmasq 2.62, router lifetime was equal to RDNSS lifetime, as shown
below:

# rdisc6 wlan0
Soliciting ff02::2 (ff02::2) on wlan0...

Hop limit                 :           64 (      0x40)
Stateful address conf.    :           No
Stateful other conf.      :           No
Mobile home agent         :           No
Router preference         :       medium
Neighbor discovery proxy  :           No
Router lifetime           :         1800 (0x00000708) seconds
[...]
   Recursive DNS server     : fe80::fad1:11ff:fe54:3381
    DNS server lifetime     :         1800 (0x00000708) seconds
   from fe80::fad1:11ff:fe54:3381

this prevented the situation where the network-manager bug would happen:
as the kernel would issue a RS to renew the router lifetime, the RDNSS
was renewed as well, just in time

in network-manager 0.9.6 the bug is fixed (NM sends a RS by itself,
before RDNSS expires, independent of RtrAdvLifetime)
but notably debian squeeze still ships 0.9.4, which reconnects to the
network every 20 minutes when talking to a dnsmasq v2.66 (worked well
against v2.62)

Router lifetime           :         1800 (0x00000708) seconds
    DNS server lifetime     :         1200 (0x000004b0) seconds

then, even though it's the debian/etc maintainers who should fix their
packages...

    https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/993571

can we anyway consider going back to the old behaviour in dnsmasq, to
help mitigation?
(RtrAdvLifetime = RDNSSLifetime)

i understand v2.66 follows RFC6106

       Lifetime      32-bit unsigned integer.  The maximum time, in
                     seconds (relative to the time the packet is sent),
                     over which this RDNSS address MAY be used for name
                     resolution.  Hosts MAY send a Router Solicitation to
                     ensure the RDNSS information is fresh before the
                     interval expires.  In order to provide fixed hosts
                     with stable DNS service and allow mobile hosts to
                     prefer local RDNSSes to remote RDNSSes, the value of
                     Lifetime SHOULD be bounded as
                     MaxRtrAdvInterval<= Lifetime<= 2*MaxRtrAdvInterval
                     where MaxRtrAdvInterval is the Maximum RA Interval
                     defined in [RFC4861].  A value of all one bits
                     (0xffffffff) represents infinity.  A value of zero
                     means that the RDNSS address MUST no longer be used.

but this RFC has been criticised already[1] (since it creates a fragile
situation, where a single or couple of RA packet losses - common in wifi
scenarios - are enough to lose the race condition)

      [1]: https://bugzilla.redhat.com/show_bug.cgi?id=753482#c38

and using RtrAdvLifetime = RDNSSLifetime only defies the "SHOULD"
keyword used in the RFC, strictly speaking.
in addition, dnsmasq (contrary to radvd) actually provides the RDNSS
service itself, so it's shouldn't be much of an issue to announce a
longer lifetime for that?

just a thought :)

Cheers!

gui

_______________________________________________
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss




_______________________________________________
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss



_______________________________________________
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss

Re: [Dnsmasq-discuss] following RFC6106 triggers bug in network-manager

Reply via email to