On 28/03/2022 11:49, Ercolino de Spiacico wrote:
This Internet list above (https://hosts.oisd.nl) is 40MB
uncompressed, the regex extracts domains-only so shrinking it to 60%
of its original size and the gzip compression shrinks much further.
Decompressing and scripting it up of course takes time.
All of that looks like stuff which can be done before stopping
dnsmasq, right. SO how long it takes makes no difference to how long
DNS and DHCP service is interrupted for?
I just reported what's done to potentially try to replicate that's all.
So going straight to the point:
root@sparrow:/# zcat /mnt/USB/adblock/adblock.domains.gz | wc -l
658839
root@sparrow:/# time service dnsmasq restart
.............................................................................................................................................................
Done.
real 0m 15.76s
user 0m 0.01s
sys 0m 0.00s
But please note, at this point the process is still owned by root
root@sparrow:/# ps | grep [d]nsmasq
31137 root 13264 S dnsmasq -c 4096 --log-async
Until this is delegated to nobody (15 more seconds I'd say) name
resolution is not performed. So 30sec in total? And this is with a list
of 650K records, well below the maximum we managed to push on the same
router with unscripted config which accounted for 1.8M.
It looks like your script which downloads the blocked domains file and
compresses it takes 15s, then dnsmasq takes 15s to uncompress the list
and load it into memory and sort.
The first delay can be solved by doing the download before stopping the
old dnsmasq process. The second is amenable the new option to SIGTERM
the old dnsmasq _after_ parsing the new config.
Right! Could you please share more details on this idea? It could be
a smart workaround indeed.
When dnsmasq starts up, it does roughly the following things:
1) Read config files (including execute script-conf
2) organise the data from the config into in-memory data structures
3) Open listening sockets on DNS and DHCP ports.
4) Enter the event loop.
1) and 2) are likely to take an appreciaable amount of time with big
block lists. 3) and 4) never do.
A restart of dnsmasq consists of
1) send SIGTERM to existing dnsmasq process to cause it to halt.
2) Run new dnsmasq instance, which goes through the four steps above.
If reading the congfig takes time, that causes service interruption.
Note that you can't do those two steps in the opposite order, since
the old dnsmasq process will still be listening on the DNS and DHCP
ports, and the new one will fail to start up.
If we add an option to dnsmasq which takes a process-ID and sends a
SIGTERM to that process-ID as step 2.5, the old dnsmasq process can
continue to run during the parsing of the options, then the it gets
toen down before opening listening sockets.
The restart now just runs the new dnsmasq instance, passing the PID of
the old dnsmasq instance if it exists.
Ok understood. Interesting it raises 3 questions on my side:
1- say the old process is already handling 1M blocked domains, I suppose
this means that for the time of the restarting the RAM demand will be
pushed to the equivalent of 2M domains (1M old + 1M new), right?
Right.
2- this will certainly help the restart but there's nothing that can
save us from the first ever start after a device reboot I suppose
No.
3- Is there any option to hand over the existing resolution cache
between the "leaver" and "joiner" process? Like a memory mapping swap or
something?
It could be done, but it's a huge rewrite.
Simon.
_______________________________________________
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss
_______________________________________________
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss