It's perfectly valid to have multiple distinct prefixes configured on an interface, so just remembering one subnet isn't good enough in the general case. Although it's certainly an improvement over a single address.

I think a complete fix would be to remember all (interface, prefix) pairs that we're doing RAs on,  and only (re)start fast RAs for the interface if the subnet isn't already being served RA's. I imagine this list already exists somewhere, since the RAs are being sent there. But it's been a while since I looked through the code.

-- Maarten

On 11-09-2019 23:40, Simon Kelley wrote:
That's nasty.

I'm not sure how to properly solve this. I'm inclined to apply your
patch, on the grounds that it at least works better.....


On 02/09/2019 18:45, Petr Mensik wrote:
Yes, it seems originating system is auto configuring interface on behalf
own RA. I have modified the test to include ip monitor output. It
receives autoconfiguration few seconds after bridge interface comes up.

Don't know how much is involved fact network namespace is used on a
bridge, it should not matter. A bit suspicious is STALE router just
before autoconfiguration. I doubt it is related, but Avahi is trying
mdns on that interfaces. Of course, Network Manager is touching it also.

Since it is custom interface created in namespace, any other host cannot
send RA to it. So I am positive it autoconfigures itself, at least on my
Fedora 29. Has same results when only bridge is used and when loopback
is also used.

14:32:22.711> 2: simbr    inet6 fc58:a22:180d:7800::1/64 scope global
14:32:25.289> fe80::6887:6dff:fe07:6f54 dev simbr lladdr
6a:87:6d:07:6f:54 router STALE
14:32:25.293> prefix fc58:a22:180d:7800::/64dev simbr onlink autoconf
valid 1800 preferred 1800
14:32:27.317> 2: simbr    inet6
fc58:a22:180d:7800:6887:6dff:fe07:6f54/64 scope global dynamic mngtmpaddr
14:32:27.318> valid_lft 1798sec preferred_lft 1798sec


On 8/30/19 11:26 PM, Simon Kelley wrote:
This is useful information, but what I don't understand, is where the
flooding comes from. Sure, this confusion means that unsolicted ra will
run every time there's a "new address" event, even if the new address
isn't on the expected interface, but I can't see how it generates more
"new address events" and therefore a flood of packets.

Unless, the originating system receives _its_own_ RA and that generates
a "new address" event?


On 28/08/2019 20:38, Petr Mensik wrote:

I have found what is going on.

That RA seems to be switching between dynamically assigned address and
manually assigned address. It is just wrong to assume there is one
address on physical interface, especially in IPv6 world.

It seems my patch (attached), checking just subnet and not caring for
exact address inside, fixes advertisement floods. But I am not sure
whether it also does not stop announces for new dynamic addresses as it
should. It might help to use valid parameter to distinguish between
static address and dynamic. I am unsure if it is required for both or
just dynamic one?

I am sure it would send once for newly created interface. I think it
should be enough, right?

Some notes from debugging:

Breakpoint 1, construct_worker (scope=<optimized out>, flags=<optimized
out>, preferred=<optimized out>, valid=1800,
     vparam=0x7ffc9afc2b60, if_index=2, prefix=64, local=0xa6dda4) at
2: /x *local = {__in6_u = {__u6_addr8 = {0xfc, 0x58, 0xa, 0x22, 0x18,
0xd, 0x78, 0x0, 0x8, 0x21, 0xd1, 0xff, 0xfe, 0x74, 0xec,
       0x2a}, __u6_addr16 = {0x58fc, 0x220a, 0xd18, 0x78, 0x2108, 0xffd1,
0x74fe, 0x2aec}, __u6_addr32 = {0x220a58fc, 0x780d18,
       0xffd12108, 0x2aec74fe}}}

Breakpoint 1, construct_worker (scope=<optimized out>, flags=<optimized
out>, preferred=<optimized out>, valid=-1,
     vparam=0x7ffc9afc2b60, if_index=2, prefix=64, local=0xa6ddec) at
685                     ra_start_unsolicited(param->now, template);
2: /x *local = {__in6_u = {__u6_addr8 = {0xfc, 0x58, 0xa, 0x22, 0x18,
0xd, 0x78, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1},
     __u6_addr16 = {0x58fc, 0x220a, 0xd18, 0x78, 0x0, 0x0, 0x0, 0x100},
__u6_addr32 = {0x220a58fc, 0x780d18, 0x0, 0x1000000}}}

Cooperative ip link:
2: simbr: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
UP group default qlen 1000
     link/ether 0a:21:d1:74:ec:2a brd ff:ff:ff:ff:ff:ff
     inet scope global simbr
        valid_lft forever preferred_lft forever
     inet6 fc58:a22:180d:7800:821:d1ff:fe74:ec2a/64 scope global dynamic
        valid_lft 1699sec preferred_lft 1699sec
     inet6 fc58:a22:180d:7800::1/64 scope global
        valid_lft forever preferred_lft forever
     inet6 fe80::821:d1ff:fe74:ec2a/64 scope link
        valid_lft forever preferred_lft forever


On 8/27/19 10:42 PM, Maarten de Vries wrote:

I haven't dug very deep yet, but I can comment on the intent of the
particular commit: without it, dnsmasq didn't do any unsolicited RAs on
interfaces that are created after dnsmasq was started. It definitely
should do unsolicited RAs on those interfaces too, although obviously
not quite so many so often. I'm not sure why that happens. Note that the
commit didn't introduce the fast RAs, it only enabled unsolicited RAs
(including fast) for newly created interfaces too.

I wonder why this happens in those test cases and at-least one Raspberry
Pi, but not on my server. Is there any information you could provide to
pinpoint when exactly this bug triggers and when not? For example: what
happens if the virtual interface is created before dnsmasq starts? Does
it also trigger on bridge interfaces (which is what I personally tested
the commit with) for you?

I will attempt to investigate too, but I'm somewhat swamped for time so
I can't promise fast results.

Kinds regards,


On 27-08-2019 10:45, Iain Lane wrote:
On Wed, Aug 21, 2019 at 08:59:07PM +0200, Petr Mensik wrote:
Hi Simon and Maarten,

we discovered when playing with NetworkManager-ci [1], that lastest
release is somehow broken. Test running dnsmasq are quite slow on latest

I have created repeatable started script that reproduces it. Then used
git bisect to find when it was broken. It seems fast sending were
intentional in commit 0a496f059c1e9 [2], but maybe way it affects the
system were underestimated. It is significant for systems that hit such
issue. I think it has to be fixed to slow it down to short time
interval, not endless loop. Reported as Fedora bug [3].
Thanks for this Petr. Would you be able to share the script you've used,
so that perhaps an upstream developer could recreate the bug?

Mainly I wanted to chime in and say that (in addition to the other
instance referenced), we found this in the NetworkManager testsuite in
Ubuntu. I didn't come up with a nice reproducer at the time, but we did
identify the same commit and we've reverted it in Ubuntu. I posted on
the ML back then but we didn't get much traction and I didn't follow up
very aggressively.

    (the commit ID referenced in the changelog there seems or from
    somewhere else, it's the same patch)


Dnsmasq-discuss mailing list

Dnsmasq-discuss mailing list

Dnsmasq-discuss mailing list

Dnsmasq-discuss mailing list

Dnsmasq-discuss mailing list

Dnsmasq-discuss mailing list

Reply via email to