Hi NSD developers and users,

I've observed a situation with NSD that I think deserves some attention, and perhaps some kind of fix.

We have a server with 32GB of RAM. When we start NSD, it loads all the zones, and happily serves them. It uses close to 15GB of RAM. After a while, it gets a NOTIFY for a zone, and AXFRs the zone. It saves the XFR in /var/lib/nsd/nsd-xfr-5231. It then tries to apply the update, and this is when it all goes wrong. NSD's method of updating is to fork itself, have the child reload the changed zone(s), and take over from the parent... except that it can't fork because of memory shortage. While forking, NSD temporarily uses double the amount of RAM.

The log shows this:

[2022-03-30 15:16:27.986] nsd[5299]: error: fork failed: Cannot allocate memory [2022-03-30 15:16:28.355] nsd[45999]: error: handle_reload_cmd: reload closed cmd channel [2022-03-30 15:16:28.355] nsd[45999]: warning: Reload process 5299 failed, continuing with old database [2022-03-30 15:16:28.355] nsd[5231]: error: process 5299 exited with status 256 [2022-03-30 15:16:29.776] nsd[45999]: error: fork failed: Cannot allocate memory [2022-03-30 15:16:30.149] nsd[46012]: error: handle_reload_cmd: reload closed cmd channel [2022-03-30 15:16:30.149] nsd[46012]: warning: Reload process 45999 failed, continuing with old database [2022-03-30 15:16:31.748] nsd[46013]: error: handle_reload_cmd: reload closed cmd channel [2022-03-30 15:16:31.748] nsd[46013]: warning: Reload process 46012 failed, continuing with old database

After this, there are no more log entries about trying to reload the database.

PID 5231 is the xfrd process, and 5299 was the master that coordinates things. Now, the situation looks like this:

# systemctl status nsd
● nsd.service - NSD DNS Server
Loaded: loaded (/usr/lib/systemd/system/nsd.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2022-01-04 12:07:30 UTC; 2 months 28 days ago
 Main PID: 5231 (nsd: xfrd)
   CGroup: /system.slice/nsd.service
           ├─ 5231 /usr/sbin/nsd -d
           ├─46013 /usr/sbin/nsd -d
           ├─46016 /usr/sbin/nsd -d
           └─46024 /usr/sbin/nsd -d

So we have the state where the xfrd process is running, and keeps doing zone transfers, which slowly accumulate in /var/lib/nsd/nsd-xfr-5231. Eventually, this will fill up the disk. Additionally, we have child processes running and serving queries, but the zones are now outdated. But there is no master process to apply the transfers. Log file rotation is also broken, because when I run "nsd-control log_reopen", no new log file is created. This will also cause the log file to grow unbounded, until it fills up the disk. Essentially, NSD is crippled, and only a restart will get it out of this broken state.

The easiest way to prevent this is to add RAM to the server. But my opinion is that this is a waste of resources. It may also not be trivial to do so. It might be easier on a virtual server, but with a physical server, one needs to buy RAM, shut down the server and add the memory modules. In this area, I find NSD to be deficient. Other name servers handle their memory differently, and make incremental use of memory as zones are added.

A question for the developers is: is there any way to make NSD handle zone reloads more efficiently rather than doing this fork/reload?

Regards,
Anand
_______________________________________________
nsd-users mailing list
nsd-users@lists.nlnetlabs.nl
https://lists.nlnetlabs.nl/mailman/listinfo/nsd-users

Reply via email to