On 17/02/2020 14:37, Geert Stappers wrote:
> On 17-02-2020 14:31, Donald Sharp wrote:
> 
>> Running:
>>
>> sharpd@eva:~/dnsmasq$ /sbin/dnsmasq --version
>> Dnsmasq version 2.80  Copyright (c) 2000-2018 Simon Kelley
> 
> 2018,  no  short-git-hashes nor simular indicators on source version.
> 
> 
>> Compile time options: IPv6 GNU-getopt DBus i18n IDN DHCP DHCPv6 no-Lua
>> TFTP conntrack ipset auth DNSSEC loop-detect inotify dumpfile
>> ----
>>
>> When I install several hundred thousand routes into the kernel and
>> remove them( or some variation thereof ), dnsmasq eventually ends up
>> running 100% cpu:
>>
>> top - 18:45:18 up 1 day,  7:44,  1 user,  load average: 2.70, 2.65, 2.34
>> Tasks: 424 total,   3 running, 421 sleeping,   0 stopped,   0 zombie
>> %Cpu(s): 12.1 us,  6.9 sy,  0.0 ni, 80.2 id,  0.0 wa,  0.0 hi,  0.7
>> si,  0.0 st
>> MiB Mem :  32131.3 total,  19483.6 free,   6620.3 used,   6027.4
>> buff/cache
>> MiB Swap:  32718.0 total,  31693.0 free,   1025.0 used.  24698.2 avail Mem
>>
>>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+
>> COMMAND                            
>>  293183 nobody    20   0   11040   2040   1688 R  99.7   0.0 148:48.40
>> dnsmasq       
> 
> 
> The "CPU 100%" made me do  `git log` and a "find" on 'CPU'.  I found
> 
> 
> commit df6636bff61aa53ed7ad4b34d940805193c0bc74
> Author: Florent Fourcot <florent.four...@wifirst.fr>
> Date:   Mon Feb 11 17:04:44 2019 +0100
> 
>     lease: prune lease as soon as expired
>    
>     We detected a performance issue on a dnsmasq running many dhcp sessions
>     (more than 10 000). At the end of the day, the server was only releasing
>     old DHCP leases but was consuming a lot of CPU.
>    
>     It looks like curent dhcp pruning:
>      1) it's pruning old sessions (iterate on all current leases). It's
>      important to note that it's only pruning session expired since more
>      than one second
>      2) it's looking for next lease to expire (iterate on all current leases
>      again)
>      3) it launchs an alarm to catch next expiration found in step 2). This
>      value can be zero for leases just expired (but not pruned).
>    
>     So, for a second, dnsmasq could fall in a "prune loop" by doing:
>      * Not pruning anything, since difftime() is not > 0
>      * Run alarm again with zero as argument
>    
>     On a server with very large number of leases and releasing often
>     sessions, that can waste a very big CPU time.
>    
>     Signed-off-by: Florent Fourcot <florent.four...@wifirst.fr>
> 
> 
> 
> 
>>
>> strace output:
>>
>> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
>> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
>> events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=POLLERR}])
>>     ....
>> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
>> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
>> events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=POLLERR}])
>> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
>> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
>> events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=PO^Cstrace: Process
>> 293183 detached
>>
>> I can pretty much make this happen at will.  What can I provide to
>> help debug this?
> 
> Start with stating how recent the source is that you are using.
> 
> 
>>
>> As a side note, I was not placing these routes into the default linux
>> routing table.  Does dnsmasq need to be paying attention to these routes?
> 
> Side notes in a separate thread  please.
> 
> 
>>
>> donald
>>
> 
> Regards
> 
> Geert Stappers
> 

Geert, you're confusing things. It's perfectly clear that the process is
running 100% CPU beacuse the poll() calls are returning an error which
the code is not expecting and doesn't handle. It just calls poll()
again, and because the error wasn't cleared, poll returns immediately
again, rinse and repeat.

The solution is to handle the error (it's not obvious to me how to do
that) or to avoid creating the error condition in the first place.

To get further, we need to know which socket is erroring. It's file
descriptor four in the strace, but is that the netlink socket, or a DHCP
socket or a socket used to talk DNS upstream, or DNS downstream. We
don't know  without further information.

Simon.


_______________________________________________
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss

Reply via email to