Re: Kernel dumps [was Re: possible changes from Panzura]

Julian Elischer Thu, 11 Jul 2013 07:28:12 -0700

On 7/11/13 6:09 AM, Kevin Day wrote:


Those sound useful.   Just out of curiosity, however, since we're on the topic 
of kernel dumps:  Has anyone even looked into the notion of an emergency 
fall-back network stack to enable remote kernel panic (or system hang) 
debugging, the way OS X lets you do?  I can't tell you the number of times I've 
NMI'd a Mac and connected to it remotely in a scenario where everything was 
totally wedged and just a couple of minutes in kgdb (or now lldb) quickly 
showed that everything was waiting on a specific lock and the problem became 
manifestly clear.

The feature also lets you scrape a panic'd machine with automation, running 
some kgdb scripts against it to glean useful information for later analysis vs 
having to have someone schlep the dump image manually to triage.  It's going to 
be damn hard to live without this now, and if someone else isn't working on it, 
that's good to know too!

I could imagine that we could stash away a vimage stack just for thispurpose.

yould set it up on boot and leave it detached until you need it.

you just need to switch the interfaces over to the new stack on panicand put them into 'poll' mode.


Or maybe you'd need more (like pre-allocating mbufs for it to use).

Just an idea.


At a previous employer, we had a system where on a panic it had a totally 
separate stack capable of just IP/UDP/TFTP and would save its core via TFTP to 
a server. This isn’t as nice as full remote debugging, but it was a whole lot 
easier to develop. The caveats I remember were:

1) We didn’t want to implement ARP, so you had to write the mac address of the 
“dump server” to the kernel via sysctl before crashing.
2) We also didn’t want to have to deal with routing tables, so you had to 
manually specify what interface to blast packets out to, also via sysctl.
3) After a panic we didn’t want to rely on interrupt processing working, so it 
polled the network interface and blocked whenever it needed to. Since this was 
an embedded system, it wasn’t too big of a deal - only one network driver had 
to be hacked to support this. Basically a flag that would switch to “disable 
normal processing, switch to polled fifos for input and output” until reboot.
4) The whole system used only preallocated buffers and its own stack (carved 
out from memory on boot) so even if the kernel’s malloc was trashed, we could 
still dump.

I’m not sure this really would scratch your itch, but I believe this took me no 
more than a day or two to implement. Parts #1 and #2 would be pretty easy, but 
I’m not sure how generic the kernel could support an emergency network mode 
that doesn’t require interrupts for every network card out there. Maybe that 
isn’t as important to you as it was to us.

The whole exercise is much easier if you don’t use TFTP but a custom protocol 
that doesn’t require the crashing system to receive any packets, if it can just 
blast away at some random host oblivious if it’s working or not, it’s a lot 
less code to write.


_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"

Re: Kernel dumps [was Re: possible changes from Panzura]

Reply via email to