On 28/03/2022 11:49, Ercolino de Spiacico wrote:

This Internet list above (https://hosts.oisd.nl) is 40MB uncompressed, the regex extracts domains-only so shrinking it to 60% of its original size and the gzip compression shrinks much further. Decompressing and scripting it up of course takes time.


All of that looks like stuff which can be done before stopping dnsmasq, right. SO how long it takes makes no difference to how long DNS and DHCP service is interrupted for?


I just reported what's done to potentially try to replicate that's all. So going straight to the point:

root@sparrow:/# zcat /mnt/USB/adblock/adblock.domains.gz | wc -l
658839

root@sparrow:/# time service dnsmasq restart
.............................................................................................................................................................
Done.
real    0m 15.76s
user    0m 0.01s
sys     0m 0.00s

But please note, at this point the process is still owned by root

root@sparrow:/# ps | grep [d]nsmasq
31137 root     13264 S    dnsmasq -c 4096 --log-async

Until this is delegated to nobody (15 more seconds I'd say) name resolution is not performed. So 30sec in total? And this is with a list of 650K records, well below the maximum we managed to push on the same router with unscripted config which accounted for 1.8M.

It looks like your script which downloads the blocked domains file and compresses it takes 15s, then dnsmasq takes 15s to uncompress the list and load it into memory and sort.

The first delay can be solved by doing the download before stopping the old dnsmasq process. The second is amenable the new option to SIGTERM the old dnsmasq _after_ parsing the new config.



Right! Could you please share more details on this idea? It could be a smart workaround indeed.



When dnsmasq starts up, it does roughly the following things:

1) Read config files (including execute script-conf
2) organise the data from the config into in-memory data structures
3) Open listening sockets on DNS and DHCP ports.
4) Enter the event loop.

1) and 2) are likely to take an appreciaable amount of time with big block lists. 3) and 4) never do.

A restart of dnsmasq  consists of

1) send SIGTERM to existing dnsmasq process to cause it to halt.
2) Run new dnsmasq instance, which goes through the four steps above.

If reading the congfig takes time, that causes service interruption. Note that you can't do those two steps in the opposite order, since the old dnsmasq process will still be listening on the DNS and DHCP ports, and the new one will fail to start up.

If we add an option to dnsmasq which takes a process-ID and sends a SIGTERM to that process-ID as step 2.5, the old dnsmasq process can continue to run during the parsing of the options, then the it gets toen down before opening listening sockets.

The restart now just runs the new dnsmasq instance, passing the PID of the old dnsmasq instance if it exists.

Ok understood. Interesting it raises 3 questions on my side:

1- say the old process is already handling 1M blocked domains, I suppose this means that for the time of the restarting the RAM demand will be pushed to the equivalent of 2M domains (1M old + 1M new), right?

Right.

2- this will certainly help the restart but there's nothing that can save us from the first ever start after a device reboot I suppose

No.

3- Is there any option to hand over the existing resolution cache between the "leaver" and "joiner" process? Like a memory mapping swap or something?

It could be done, but it's a huge rewrite.

Simon.


_______________________________________________
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss


_______________________________________________
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss

Reply via email to