Resending this because I realized I sent it to Simon rather than the list: Hi Simon,
This bug is fairly easy to reproduce. It can take 10 mins or more to reproduce a crash so I suggest checking the length you get in cache_recv_insert. If the domains being used are > 4 characters, this will catch the bug the first time it occurs. That usually happens within a minute. Something like this: if (m > MAXDNAME) abort(); Then bombard dnsmasq with TCP queries: docker run --rm -v /root/dns_queries:/dns_queries aiys0211/flamethrower_v0.10.2 -f /dns_queries/queryfile-example-100thousand -l 1000 -c 20 -q 10 -d 300 -p 53 -P tcp <ip of dnsmasq> I used the first 100 thousand entries from here: https://www.dns-oarc.net/files/dnsperf/data/queryfile-example-10million-201202.gz Frank On Sun, Jun 28, 2020 at 1:58 PM Simon Kelley <si...@thekelleys.org.uk> wrote: > > That's a nasty bug, and could explain what pi-hole users are seeing. > > If I understand things correctly, this bug will only manifest itself > when the write() or read() syscalls return EINTR ir EAGAIN, which is > possible, but not common, hence the bugs wasn't detected earlier. > > Frank, did you find a way to reproduce this on demand, or just capture > sporadic instances? > > I've applied Frank's patch, but immediately superseded it with a more > general clean-up which should have the same effect. I've tagged a > 2.82test1 release with this. If that fixes the problem for pi-hole, I > plan to go to 2.82 fairly quickly: I don't want this hanging around. > > It's slightly ironic that the original change in 2.81 is to fix a > theoretical hang which I've never actually seen and cannot reproduce, > but it introduced a really crash bug. > > > Cheers, > > Simon. > > > > > On 18/06/2020 22:42, Dominik wrote: > > Hey Frank, dear list-members, > > > > thanks for your proposed fix. The Pi-hole team adopted dnsmasq v2.81 > > early on and we're seeing reports for mysterious crashes scattered all > > over the dnsmasq code since the very first days of releasing our latest > > version. Crashes reports show several locations, e.g., in_zone(), > > iface_check() and even main() in a few places. TCP forks were always > > happening sometime (not necessarily immediately) before a crash. > > Running dnsmasq in debug mode (prevents forking) stopped any crashes. > > > > So far, we haven't had the time to dig into this ourselves properly, > > however, your description perfectly matches with my preliminary > > conclusion for one week ago: > >> the dangling pointer is in **all** bug reports known so far in the > > daemon pointer > > (https://github.com/pi-hole/FTL/issues/805#issuecomment-643157367) > > > > The addition of TCP forks being able to add cache replies in dnsmasq > > v2.81 is what is causing this code (which has been like this for along > > time) to become problematic. I recently looked at the same code but > > missed the chance that the byte may be still in the pipe. This is a > > very good catch! > > > > I'm writing this mail to raise awareness that this appears to be a high > > severity bug as it can easily lead to anything between indeterminable > > behavior to fatal failures (typically the case). Even when the bug can > > only be triggered under certain conditions, it recently caused over 200 > > posts on the Pi-hole issue ticket system (Github). Crashes due to this > > have been reported independently by several (more than 20) individual > > users. > > Best > > Dominik > > > > On Wed, 2020-06-17 at 19:52 -0700, Frank wrote: > >> Hello, > >> > >> This patch fixes a buffer overflow in TCP requests. Since the read is > >> not actually being retried, the byte written by the child can be left > >> in the pipe. When that happens, cache_recv_insert() reads the length > >> of the name, which is now multiplied by 256 due to the extra 0 byte > >> (8 bit shift) and results in daemon->namebuff being overflowed. > >> Namebuff is immediately before the daemon struct in memory so it ends > >> up corrupting the beginning of the daemon struct. > >> > >> diff --git a/src/dnsmasq.c b/src/dnsmasq.c > >> index 6481de8..457dea3 100644 > >> --- a/src/dnsmasq.c > >> +++ b/src/dnsmasq.c > >> @@ -1887,7 +1887,7 @@ static void check_dns_listeners(time_t now) > >> single byte comes back up the pipe, which > >> is sent by the child after it has closed > >> the > >> netlink socket. */ > >> - retry_send(read(pipefd[0], &a, 1)); > >> + while (retry_send(read(pipefd[0], &a, 1))); > >> #endif > >> break; > >> } > >> @@ -1928,7 +1928,7 @@ static void check_dns_listeners(time_t now) > >> #ifdef HAVE_LINUX_NETWORK > >> /* See comment above re netlink socket. */ > >> close(daemon->netlinkfd); > >> - retry_send(write(pipefd[1], &a, 1)); > >> + while (retry_send(write(pipefd[1], &a, 1))); > >> #endif > >> } > >> > >> Frank > >> > >> > >> _______________________________________________ > >> Dnsmasq-discuss mailing list > >> Dnsmasq-discuss@lists.thekelleys.org.uk > >> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss > > > > > > _______________________________________________ > > Dnsmasq-discuss mailing list > > Dnsmasq-discuss@lists.thekelleys.org.uk > > http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss > > > _______________________________________________ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss