On Sun, May 15, 2022 at 12:06:33PM +0200, Stephan Mending wrote:
> Hi *,
> I've got a system running -current that keeps crashing on me every couple of
> days.
> Output of ddb:
>
> Connected to /dev/cuaU0 (speed 115200)
>
> ddb{0}> show panic
> the kernel did not panic
> ddb{0}> show uvm
> Current UVM status:
> pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
> 482451 VM pages: 43158 active, 132795 inactive, 35 wired, 192336 free
> (24054 z
> ero)
> min 10% (25) anon, 10% (25) vnode, 5% (12) vtext
> freemin=16081, free-target=21441, inactive-target=0, wired-max=160817
> faults=2487210, traps=2404140, intrs=211883, ctxswitch=1960560 fpuswitch=0
> softint=3499069, syscalls=2015497, kmapent=9
> fault counts:
> noram=0, noanon=0, noamap=0, pgwait=0, pgrele=0
> ok relocks(total)=192470(193514), anget(retries)=603205(0),
> amapcopy=177151
>
> neighbor anon/obj pg=82033/639788, gets(lock/unlock)=415897/193548
> cases: anon=570367, anoncow=32838, obj=347149, prcopy=67670,
> przero=1469152
>
> daemon and swap counts:
> woke=0, revs=0, scans=0, obscans=0, anscans=0
> busy=0, freed=0, reactivate=0, deactivate=0
> pageouts=0, pending=0, nswget=0
> nswapdev=1
> swpages=526020, swpginuse=0, swpgonly=0 paging=0
> kernel pointers:
> objs(kern)=0xffffffff8238a038
> ddb{0}> show trace
> No such command
> ddb{0}> trace
> icmp_mtudisc_timeout(fffffd807a50b070,0) at icmp_mtudisc_timeout+0x77
> rt_timer_timer(ffffffff8235d668) at rt_timer_timer+0x1cc
> softclock_thread(ffff8000fffff260) at softclock_thread+0x13b
> end trace frame: 0x0, count: -3
> ddb{0}>
>
> Output of a second crash:
>
> ddb{0}> show panic
> the kernel did not panic
> ddb{0}> trace
> icmp_mtudisc_timeout(fffffd8069f9f700,0) at icmp_mtudisc_timeout+0x77
> rt_timer_timer(ffffffff8231bfc8) at rt_timer_timer+0x1cc
> softclock_thread(ffff8000fffff500) at softclock_thread+0x13b
> end trace frame: 0x0, count: -3
> ddb{0}> show uvm
> Current UVM status:
> pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
> 482457 VM pages: 29240 active, 133535 inactive, 35 wired, 205028 free
> (25630 z
> ero)
> min 10% (25) anon, 10% (25) vnode, 5% (12) vtext
> freemin=16081, free-target=21441, inactive-target=0, wired-max=160819
> faults=687274, traps=693441, intrs=75204, ctxswitch=381252 fpuswitch=0
> softint=615411, syscalls=607703, kmapent=9
> fault counts:
> noram=0, noanon=0, noamap=0, pgwait=0, pgrele=0
> ok relocks(total)=185433(186477), anget(retries)=141598(0), amapcopy=75047
> neighbor anon/obj pg=69895/201703, gets(lock/unlock)=256502/186509
> cases: anon=114948, anoncow=26650, obj=237702, prcopy=17724, przero=290216
> daemon and swap counts:
> woke=0, revs=0, scans=0, obscans=0, anscans=0
> busy=0, freed=0, reactivate=0, deactivate=0
> pageouts=0, pending=0, nswget=0
> nswapdev=1
> swpages=526020, swpginuse=0, swpgonly=0 paging=0
> kernel pointers:
> objs(kern)=0xffffffff82317458
> ddb{0}> show bcstats
> Current Buffer Cache status:
> numbufs 24114 busymapped 0, delwri 5
> kvaslots 6030 avail kva slots 6030
> bufpages 96426, dmapages 96426, dirtypages 20
> pendingreads 0, pendingwrites 0
> highflips 0, highflops 0, dmaflips 0
> ddb{0}> mount
> No such command
> ddb{0}> trace
> icmp_mtudisc_timeout(fffffd8069f9f700,0) at icmp_mtudisc_timeout+0x77
> rt_timer_timer(ffffffff8231bfc8) at rt_timer_timer+0x1cc
> softclock_thread(ffff8000fffff500) at softclock_thread+0x13b
> end trace frame: 0x0, count: -3
>
>
>
> Especially the line stating "the kernel did not panic" surprises me, as I am
> greeted by the kernel debugger. Not sure how to interpret that.
> While looking for the reason behind these "crashes", I noticed that cron is
> constantly running at 99% cpu.
>
> As a first measure I commented out all cronjobs in place (except for daily
> weekly monthly as I figured these shouldnt
> pose a problem). But that did not remedy the problem. Right after startup
> cron starts eating away at the cpu. Does
> anybody have an idea how to further analyze the issue (apart from giving it a
> go by recompiling cron and using gdb) ?
>
Also for cron, please attach ktrace to the cron process for a few seconds
and look at the kdump of that. Most probably it is constantly woken up for
some reasons.
--
:wq Claudio