Peter 'PMc' Much wrote:
> A while back I upgraded an internal server from 9.18 to 9.20.
> Everything works fine there.
>
> Now I upgraded a public server (rented KVM). Same config, same
> infrastructure. It probably works, but there is no dnstap output, so I
> am blind.
>
> I put a hexdump on the socket like so (before starting named):
>
> root@wand:/tmp # /usr/bin/stdbuf -o 0 /usr/local/bin/fstrm_capture \
> -t protobuf:dnstap.Dnstap -u /var/named//var/run/dnstap.sock -w - | hd
> fstrm_capture: opening Unix socket path /var/named//var/run/dnstap.sock
> fstrm_capture: opened output file -
> 00000000 00 00 00 00 00 00 00 22 00 00 00 02 00 00 00 01 |......."........|
> 00000010 00 00 00 16 70 72 6f 74 6f 62 75 66 3a 64 6e 73 |....protobuf:dns|
>
> And that is all that appears.
> I delete 9.20 and reinstall/restart 9.18, and immediately some
> proper hexdump output follows up.
If you invoke fstrm_capture with the parameter -ddddd, it will
print all of the possible debug log messages. You should see a line
"fstrm_capture: accepted new connection fd [...]" if a dnstap client
was able to successfully connect to the socket. That message won't be
printed at the default debug level (no -d's).
You should also be able to confirm that the permissions are correct by
switching to your 'bind' user and attempting to connect to the dnstap
socket with a utility like socat while fstrm_capture is running, e.g.:
# su -m bind -c "socat - UNIX-CONNECT:/var/named/var/run/dnstap.sock"
socat will print system errors (permission denied, no such file or
directory, etc.) if there is a problem connecting to the socket. If
there is no output from socat and fstrm_capture -ddddd prints out
"accepted new connection", then the connection attempt succeeded and
your permissions are correct.
However, it seems to me that there might be a race condition in BIND
when running on FreeBSD.
fstrm_iothr_init() in the fstrm library is the function that is
responsible for making dnstap UNIX socket connection attempts. It looks
like BIND reaches that function via the function call sequence:
main()
setup()
named_server_create()
isc_loop_setup(...run_server...)
run_server()
load_configuration()
configure_view()
configure_dnstap()
dns_dt_create()
fstrm_iothr_init()
fstrm_iothr_init() spawns the background I/O processing thread that is
responsible for (re)opening the dnstap socket and writing queued data
into it. It does not block its caller on the initial connection attempt
(that is performed by the spawned thread), nor would we want it to since
it may make reconnection attempts later during runtime if the socket is
disconnected (e.g. if fstrm_capture is restarted). So, the caller should
probably make sure that the process has had its UID/GID changed prior to
calling fstrm_iothr_init() so that the filesystem permissions will match
between the initial connection attempt and any subsequent attempts.
On FreeBSD, it looks like BIND changes UID/GID via the function call
sequence:
main()
setup()
named_server_create()
isc_loop_setup(...run_server...)
run_server()
load_configuration()
named_os_changeuser()
In load_configuration(), the calls to named_os_changeuser() occur
*after* the calls to configure_view(), which is how fstrm_iothr_init()
is reached. So, it could be that libfstrm's background I/O thread makes
its initial socket connection attempt with root privileges, before
named_os_changeuser() has performed the setuid(), so the filesystem
permissions don't matter. Or it could be that the background I/O thread
is delayed from executing long enough that named_os_changeuser() has
already run, and the filesystem permissions must be correct in order for
the socket connection attempt to succeed.
On a Linux system with BIND built against libcap, it looks like the
setuid() occurs much earlier in the process lifetime via the function
call sequence:
main()
setup()
named_os_minprivs() <--- occurs before named_server_create()
named_os_changeuser()
That would complete prior to fstrm_iothr_init() being invoked, so there
would not be such a race condition on a Linux system.
You might be able to check if this is the case by tracing the
BIND process's (and its threads') system calls with a utility like
truss/dtrace. On the other hand, if it really is a race condition, the
act of tracing the process might disturb the process just enough to
change the outcome of the race.
--
Robert Edmonds
--
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from
this list.