I love dtrace, but I seriously doubt that it could be used it for this. This is 
the absolutely last code that executes at kernel shutdown/reboot. All other 
processes are terminated when this is happening...

Basically the code done in kern_reboot() (and stuff it calls) in 
/usr/src/sys/kern/kern_shutdown.c

This code in kern_reboot():
...
>      EVENTHANDLER_INVOKE(shutdown_pre_sync, howto);
>
>        /*
>         * Now sync filesystems
>         */
>        if (!cold && (howto & RB_NOSYNC) == 0 && once == 0) {
>                once = 1;
>                bufshutdown(show_busybufs);
>        }
>
>        print_uptime();
>
>        cngrab();
>
>        /*
>         * Ok, now do things that assume all filesystem activity has
>         * been completed.
>         */
>        EVENTHANDLER_INVOKE(shutdown_post_sync, howto);
>
>        if ((howto & (RB_HALT|RB_DUMP)) == RB_DUMP && !cold && !dumping)
>                doadump(TRUE);
>
>        /* Now that we're going to really halt the system... */
>        EVENTHANDLER_INVOKE(shutdown_final, howto);

What I’ve seen is that the three different EVENTHANDLER_INVOKE() all can cause 
a lot of calls that can take a long time. As can bufshutdown(), probably when 
it calls vfs_unmountall() to unmount all filesystems.

We’re talking like 5-20 minutes (or more). I’ve not really timed it.

I have a feeling it sometimes is due to transactions stored up in the ZFS 
filesystems that the system tries to “run” so it doesn’t have to do it at 
mount() time after the reboot. If I reset the machine while it is busy doing 
this then sometimes the “zfs mount -a” takes very long time at boot instead.

Hmm.. I should be able to use the getnanouptime() call to get a “clock” to look 
for (it’s used in print_uptime()). As long as the clock isn’t stopped at this 
time in the shutdown sequence atleast :-)

*Time to write some code and test this* :-)

- Peter


> On 29 Nov 2019, at 22:09, Enji Cooper <yaneurab...@gmail.com> wrote:
> 
> 
>> On Nov 28, 2019, at 12:52, Peter Eriksson <p...@lysator.liu.se> wrote:
>> 
>> I’ve been looking into the “kernel looks to be hung at reboot” problem at 
>> bit. Adding a lot of printf() calls to the relevant parts it looks like it 
>> actually isn’t hung but busy unmounting filesystems (which we have thousands 
>> of), flushing disk caches, calling registered callbacks and stuff and 
>> sometimes it takes a bit longer than usual - probably due to ZFS having 
>> stuff queued up that needs to be written to disk before it finishes…
>> 
>> Anyway, I’d like to propose that we add some kind of counter/printf() calls 
>> in that code area so we can see that things are indeed progressing. However, 
>> I’d probably prefer not to print _every_ filesystem (or registered callbacks 
>> - a lot of those...) - that generates a lot of output (tried that :-) but 
>> something like:
>> 
>> Unmounting filesystems:
>> 1234 done
>> (With the "1234 done” updated something like once per second).
>> 
>> What’s the right/best way to do that from the kernel? In user space I’d just 
>> call time(&t) at some convenient points and only print something if “t” has 
>> changed. :-)
> 
> Can you use DTrace instead to get the information you need?
> Cheers,
> -Enji

_______________________________________________
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to