Hi Simon,

The commit fixes all the issues we were seeing. Thanks for getting the fix
out so quickly.

I had one follow up. So now it seems that for all clients retries will use
the same SP/QID. Would it be possible to have a way/config to vary SP on
retries or are we stuck with a single SP due to the CVE? The reason we'd
prefer varying SPs is mostly due to flow hashing. Say dnsmasq is configured
with a single upstream nameserver. That means any retries will use the same
5-tuple and retries will follow the same network path. If some paths in the
network have an outage then we are stuck on that path for all retries.  In
general, we find better DNS availability when SP varies across retries and
we can traverse different paths on the network. Wondering if you had any
thoughts on this?

Thanks,
Nick

On Wed, Feb 17, 2021 at 4:03 PM Simon Kelley <si...@thekelleys.org.uk>
wrote:

> On 16/02/2021 00:42, Nicholas Mu wrote:
> > Hi,
> >
> > I noticed a low level increase in DNS errors after upgrading to 2.84.
> > After doing some packet diving, it seems that retries behave differently
> > in the new version. For my testing, I'm using dnspython but I believe
> > this issue would affect any client that uses different source ports and
> > query ids for retries. As a result, dnspython will attempt retries for
> > up to 30 seconds and will eventually timeout as only a single packet is
> > ever sent and retries are rendered ineffective.
> >
> > On 2.82, multiple packets are sent as dnspython retries. Note the
> > retries are using different source ports and query ids:
> >
> > |[ec2-user@ip-172-31-44-29 src]$ grep cell-1 /tmp/dnsmasq-2.82
> > 19:59:03.826638 IP 172.31.44.29.44547 > 172.31.0.2.53: 51880+ NS?
> > somedomain. (64)
> > 19:59:05.928335 IP 172.31.44.29.33363 > 172.31.0.2.53: 41382+ NS?
> > somedomain. (64)
> > 19:59:08.130620 IP 172.31.44.29.21177 > 172.31.0.2.53: 36073+ NS?
> > somedomain. (64)
> > 19:59:10.532792 IP 172.31.44.29.57223 > 172.31.0.2.53: 50309+ NS?
> > somedomain. (64)|
> > |
> > |
> > |On 2.84, only a single packet is sent:|
> > |
> > |
> > |[ec2-user@ip-172-31-44-29 src]$ grep cell-1 /tmp/dnsmasq-2.84
> > 19:53:12.189849 IP 172.31.44.29.5335 > 172.31.0.2.53: 826+ NS?
> > somedomain. (64)|
> > |
> > |
> > I also tested using dig, nslookup, and host which all use the same
> > source port and query id on retries. The behavior works as intended on
> > both versions. I would suspect the following commit is responsible for
> > this behavior change:
> >
> >       Handle multiple identical near simultaneous DNS queries better.
> >       Previously, such queries would all be forwarded
> >       independently. This is, in theory, inefficent but in practise
> >       not a problem, _except_ that is means that an answer for any
> >       of the forwarded queries will be accepted and cached.
> >       An attacker can send a query multiple times, and for each repeat,
> >       another {port, ID} becomes capable of accepting the answer he is
> >       sending in the blind, to random IDs and ports. The chance of a
> >       succesful attack is therefore multiplied by the number of repeats
> >       of the query. The new behaviour detects repeated queries and
> >       merely stores the clients sending repeats so that when the
> >       first query completes, the answer can be sent to all the
> >       clients who asked. Refer: CVE-2020-25686.
> >
> > Is this intended? Seems to me any clients with retry behavior similar to
> > dnspython are now broken. Clients will hang until their configured
> > timeouts are reached on any single DNS failure.
> >
> > Thanks,
> >
> > Nick
> >
>
>
> Your analysis is spot-on.
>
> I think it's possible to satisfy both the security and robustness
> requirements here.
>
> Pre 2.84, a retry for the same query with different query-id and/or
> source port would be treated as an independent query and forwarded
> again, with a new source-port and query-id. This gives the attacker the
> ability to increase the attack surface for cache pollution by sending
> many repeat queries.
>
> In 2.84 a repeat with the same SP/QID gets treated as it always has: as
> a retry, and the query gets forwarded again, and this time to all
> available servers.  The same query but with different SP/QID now gets
> piggy-backed onto the existing query, as you noted.
>
> The solution here is for a repeat query with different SP/QID to trigger
> the same retry behaviour as a repeat query with the same SP/QID, but
> also to still piggy back the existing query, so that no new SP/QID
> tuples are generated going upstream.
>
> I just pushed
>
> http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=141a26f979b4bc959d8e866a295e24f8cf456920
> which should implement this. Please test!
>
>
> Cheers,
>
> Simon.
>
>
>
>
> _______________________________________________
> Dnsmasq-discuss mailing list
> Dnsmasq-discuss@lists.thekelleys.org.uk
> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
>
_______________________________________________
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss

Reply via email to