On 6/13/25 13:30, Roman Bogorodskiy wrote:
Hi,

I've noticed an issue on FreeBSD which I can reproduce this way:

# ./src/dnsmasq --interface=bridge0 --except-interface=lo0 
--dhcp-range=192.168.127.2,192.168.127.254,255.255.255.0 
--dhcp-script=/usr/bin/true
$  ps aux|grep dnsm
nobody     12741    0,0  0,0    14500    3128  -  I    13:43             
0:00,00 ./src/dnsmasq --interface=bridge0 --except-interface=lo0 
--dhcp-range=192.168.127.2,192.168.127.254,255.255.255.0 
--dhcp-script=/usr/bin/true
root       12742    0,0  0,0    14500    3008  -  I    13:43             
0:00,00 ./src/dnsmasq --interface=bridge0 --except-interface=lo0 
--dhcp-range=192.168.127.2,192.168.127.254,255.255.255.0 
--dhcp-script=/usr/bin/true
novel      12763    0,0  0,0    14192    2588  1  S+   13:44             
0:00,00 grep dnsm
$
# kill 12741
$ ps aux|grep dns
root       12742    0,0  0,0    14500    3008  -  I    13:43             
0:00,00 ./src/dnsmasq --interface=bridge0 --except-interface=lo0 
--dhcp-range=192.168.127.2,192.168.127.254,255.255.255.0 
--dhcp-script=/usr/bin/true
novel      12785    0,0  0,0    14192    2560  1  S+   13:45             
0:00,00 grep dns
$

  There is a leftover process. When I attach to it using gdb I see:

(gdb) attach 12742
Attaching to program: /usr/home/novel/code/dnsmasq/src/dnsmasq, process 12742
Reading symbols from /lib/libc.so.7...
Reading symbols from /usr/lib/debug//lib/libc.so.7.debug...
Reading symbols from /lib/libsys.so.7...
Reading symbols from /usr/lib/debug//lib/libsys.so.7.debug...
Reading symbols from /libexec/ld-elf.so.1...
Reading symbols from /usr/lib/debug//libexec/ld-elf.so.1.debug...
_read () at _read.S:4
4       PSEUDO(read)
(gdb) bt
#0  _read () at _read.S:4
#1  0x00000000002208a1 in read_write (fd=19, packet=0x8204deea8 
"\260\236\212\"\b", size=112, rw=1) at util.c:783
#2  0x000000000024e6ca in create_helper (event_fd=16, err_fd=18, uid=0, gid=0, 
max_fd=1877346) at helper.c:199
#3  0x000000000023b1f1 in main (argc=5, argv=0x8204df170) at dnsmasq.c:743
(gdb)

So it looks like it's stuck reading from pipefd[0]:

(gdb) fr 2
#2  0x000000000024e6ca in create_helper (event_fd=16, err_fd=18, uid=0, gid=0, 
max_fd=1877346) at helper.c:199
199           if (!read_write(pipefd[0], (unsigned char *)&data, sizeof(data), 
RW_READ))
(gdb)

It also looks like both fd's are open in the helper side:

(gdb) p pipefd
$12 = {19, 20}
(gdb)

(gdb) call fcntl(20, 1)
$13 = 0
(gdb)

Now if I close(20):

(gdb) call close(20)
$14 = 0
(gdb) c
Continuing.
[Inferior 1 (process 12742) exited normally]
(gdb)


So the following change fixed this for me:

--- a/src/helper.c
+++ b/src/helper.c
@@ -96,6 +96,8 @@ int create_helper(int event_fd, int err_fd, uid_t uid, gid_t 
gid, long max_fd)
        close(pipefd[0]); /* close reader side */
        return pipefd[1];
      }
+  else
+      close(pipefd[1]);

    /* ignore SIGTERM and SIGINT, so that we can clean up when the main process 
gets hit
       and SIGALRM so that we can use sleep() */


FWIW, that's happening on FreeBSD 15.0-CURRENT amd64 and latest master
of dnsmasq.

However, I'm not sure that these reproduction steps are 100% sufficient.
I wasn't able to reproduce that on another FreeBSD 14.2-RELEASE amd64
system with Dnsmasq version 2.91.


I'm not sure what the bug is, but I'm very suspicious of commit 8a5fe8ce6bb6c2bd81f237a0f4a2583722ffbd1c, even though it's in the 2.91 codebase.

The write side of the pipe in the helper process is supposed to be closed by the call

close_fds(max_fd, pipefd[0], event_fd, err_fd);

at line 134 of src/helper.c

That call should close() ALL open fds except STDIN, STDOUT and STDERR, and the three fds passed in as arguments. This preserves the reader-side, as pipefd[0] is one of the arguments, but the write side should be closed. I checked in Linux (which doesn't exhibit the bug) and that's exactly what does happen.

If you look at the code for close_fds() there are two code paths. A dumb one which calls close() for every possible fd between zero and the system max except for the six which are to be spared. Then there's a smart path which reads a directory in /proc to find out which fds are actually open, and only closes those.

The smart path saves a lot of work on servers which are configured to support enormous numbers of open files per process.

The smart path used to only exist on Linux, but was introduced on BSD during the 2.91 development at the end of 2024. My suspicion is that that is the cause of the regression.

The smart path is same for Linux and BSD except that the directory full of links to open files is at /proc/self/fd on Linux and /dev/fd on *BSD If these directories don't exist then the code falls back to the dumb code path.

So, can you try and determine why close_fds() is not closing the write-side of the pipe in the helper process(), since that should already be doing what your patch does?


Cheers,

Simon.






Thanks,
Roman

_______________________________________________
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss



_______________________________________________
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss

Reply via email to