Re: Questions about kexec-tools (resend to list)

Philip Prindeville Tue, 07 Mar 2017 15:35:56 -0800

> On Mar 7, 2017, at 7:53 AM, Pratyush Anand <[email protected]> wrote:
> 
> Hi Philip,
> 
> On Sunday 05 March 2017 04:56 AM, Philip Prindeville wrote:
> 
> [...]
> 
>> 
>> In the case of having a single system kernel binary, then you’d have to 
>> install this kernel and it’s modules, and add this kernel to the boot loader 
>> configuration files, wouldn’t you?  What do my grub arguments look like?
> 
> Not necessarily all the modules. Kdump kernel will use only minimal modules. 
> You can build your initramfs with a minimum needed module, so that you can 
> boot and copy vmcore.
> 
>> 
>> Do I always load my system kernel with “crashkernel=64M@16M” per the 
>> “CONFIG_PHYSICAL_START” and here:
> 
> In the first kernel you need to pass "crashkernel=". Only size(64M )should 
> also work. Kernel should find the appropriate start address of crash kernel 
> location.
> 
>> 
>> 
>>> 2) Boot the system kernel with the boot parameter "crashkernel=Y@X",
>>> where Y specifies how much memory to reserve for the dump-capture kernel
>>> and X specifies the beginning of this reserved memory. For example,
>>> "crashkernel=64M@16M" tells the system kernel to reserve 64 MB of memory
>>> starting at physical address 0x01000000 (16MB) for the dump-capture kernel.
>> 
>> 
>> 
>> Okay, we have a 2.6MB /vmlinuz in our /boot partition, so it’s relocatable 
>> and this part applies:
>> 
>> 
>>> If you are using a compressed bzImage/vmlinuz, then use following command
>>> to load dump-capture kernel.
>>> 
>>> kexec -p <dump-capture-kernel-bzImage> \
>>> --initrd=<initrd-for-dump-capture-kernel> \
>>> --append="root=<root-dev> <arch-specific-options>"
>> 
>> 
>> 
>> Not sure I understand this part.  So if we have a relocatable kernel with 
>> crashdump built-in to our system kernel, do we need to load two kernels, 
>> just with different <arch-specific-options> and everything else being the 
>> same?
> 
> You are in primary kernel and you need to load crash kernel.
> 
> `kexec -p /boot/vmlinuz --initrd=/boot/kdump-initrd --reuse-cmdline 
> --append="irqpoll maxcpus=1 reset_devices"`  should work.



Tried something like that:

root@PowercodeBMU:/# kexec -p /boot/vmlinuz --reuse-cmdline --append="irqpoll 
maxcpus=1 reset_devices 1"
Cannot get kernel page_offset_base symbol address
Cannot load /boot/vmlinuz
root@PowercodeBMU:/# 

Not sure why I’m seeing this.


> 
> You need to prepare kdump-initrd, OR you can use current initrd, but that 
> will load all your modules of 1st kernel and 64M might not be sufficient 
> space then.


It’s an embedded system so it’s pretty skinny.  Everything needed to boot is 
“baked in”.  Everything else gets loaded as a module into the booting kernel 
via init.d scripts …



> 
>> 
>> Would the <arch-specific-options> be:
>> 
>> crashkernel=64M@16M 1 irqpoll maxcpus=1 reset_devices
> 
> "crashkernel=" *must* *not* be passed to crash kernel. It is only for the 
> primary kernel.


Okay.  And --reuse-cmdline takes care of stripping that out for you, it looks 
like.  That option isn’t discussed in Documentation/kdump/ but it might be 
handy to add something about it.



> 
>> 
>> in that case?
>> 
>> On a normally running system, using an overlay root, our cmdline looks like:
>> 
>> BOOT_IMAGE=/boot/vmlinuz block2mtd.block2mtd=/dev/sda2,65536,rootfs,5 
>> root=/dev/mtdblock0 rootfstype=squashfs rootwait console=tty0 
>> console=ttyS0,115200n8r noinitrd
> 
> So, it should also have crashkernel=64M.


Well, right.  I was talking about a nominal system before I’ve started trying 
to get it to be crash-dump capable.


> 
>> 
>> so I guess we’d just mash on those extra arguments.  On a running system, 
>> our mount points are:
>> 
>> /dev/root on /rom type squashfs (ro,relatime)
>> proc on /proc type proc (rw,nosuid,nodev,noexec,noatime)
>> sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,noatime)
>> tmpfs on /tmp type tmpfs (rw,nosuid,nodev,noatime)
>> tmpfs on /tmp/root type tmpfs (rw,noatime,mode=755)
>> tmpfs on /dev type tmpfs (rw,nosuid,relatime,size=512k,mode=755)
>> devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,mode=600)
>> debugfs on /sys/kernel/debug type debugfs (rw,noatime)
>> /dev/mtdblock1 on /overlay type jffs2 (rw,noatime)
>> overlayfs:/overlay on / type overlay 
>> (rw,noatime,lowerdir=/,upperdir=/overlay/upper,workdir=/overlay/work)
>> 
>> 
>> but it doesn’t sound like any of that would change (except perhaps mounting 
>> a USB thumb-drive if we wanted to copy our crashdump to that device instead).


Ah, actually, that’s not quite right.  /boot has been unmounted early on but 
we’ll need to keep it mounted (even if we remount it as ‘ro’).


>> 
>> So if I’ve understood, when the first loaded kernel (the system kernel) 
>> crashes, kexec will then try the next kernel it sees…  which will be 
>> something like:
>> 
>> kexec -p /boot/vmlinuz \
>>      —-append=“$(cat /proc/cmdline) irqpoll maxcpus=1 reset_devices 1”
>> 
>> (we don’t use a initrd as you can see above) and that’s described here:
> 
> OK..so you can exclude --initrd argument to kexec.


Yes.


> 
>> 
>> 
>>> Kernel Panic
>>> ============
>>> 
>>> After successfully loading the dump-capture kernel as previously
>>> described, the system will reboot into the dump-capture kernel if a
>>> system crash is triggered. [snip]
>> 
>> 
>> 
>> assuming the system isn’t so badly hosed that a WDT expires causing a BIOS 
>> reset, etc.
>> 
>> Do both kernels use the same “crashdump=“ value, or do they need different 
>> base addresses?
> 
> Again, only 1st kernel need "crashkernel=“.

Okay, got it.


> 
>> 
>> And assuming that you’re using the same kernel, etc. how does the init.d 
>> scripting on the crashdump (2nd instance of the kernel) know that it’s not 
>> the nominal kernel?  Do we use /sys/kernel/kexec_loaded for this purpose?  
>> Or do we just look for the existence of /proc/vmcore?
> 
> Yep, you can find /proc/vmcore in 2nd kernel but not in 1st kernel.
> /sys/kernel/kexec_crash_loaded  should have 1 in 1st kernel while 0 in crash 
> kernel.


So far I’m seeing the opposite:

root@PowercodeBMU:/# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz block2mtd.block2mtd=/dev/sda2,65536,rootfs,5 
root=/dev/mtdblock0 rootfstype=squashfs rootwait console=tty0 
console=ttyS0,115200n8r noinitrd crashkernel=64M
root@PowercodeBMU:/# cat /sys/kernel/kexec_crash_loaded
0
root@PowercodeBMU:/# 

Maybe it’s the other way around?


> 
>> 
>> And then have something in my init.d scripts like:
>> 
>> kexec_loaded=$(< /sys/kernel/kexec_loaded)
> 
> /sys/kernel/kexec_crash_loaded


Right.


> 
>> 
>> if [ “$kexec_loaded” = 0 ]; then
>>  kexec -p /boot/vmlinuz \
>>      —-append=“$(cat /proc/cmdline) irqpoll maxcpus=1 reset_devices 1”
>> else
>>  echo “*** HANDLING CRASH DUMP COLLECTION"
>>  mkdir -p /mnt/crashdrive
>>  mount LABEL=crashdrive /mnt/crashdrive
>>  # might do something clever here with “df —output=avail -m /mnt/crashdrive” 
>> to make
>>  # sure I have enough space for the copy, perhaps deleting older dumps until 
>> I do…
>>  cp /proc/vmcore /mnt/crashdrive
>>  sync
>>  umount /mnt/crashdrive
>>  echo “*** NOW REBOOTING"
>>  reboot -f
>> fi
>> 
> 
> Above should work.


Question… will crashkernel being 64M mean that /sys/kernel/kexec_crash_size is 
also 64M (67108864) and that would also be the size of /proc/vmcore?


> 
> There can be many ways. You can have a look on fedora kexec-tools code.
> http://pkgs.fedoraproject.org/cgit/rpms/kexec-tools.git/
> 
> 
>> Do I need to reboot in a particular way to avoid looping?  The “Kernel 
>> Panic” section seems to state that normal reboots won’t be affected.
> 
> When you execute reboot, it will reboot to the 1st kernel through grub (boot 
> loader).


Okay.

Thanks,

-Philip


> 
>> 
>> I appreciate the documentation you’ve written, but it’s a little unclear (to 
>> me at least) how to handle the degenerate case of using the same kernel as 
>> the system kernel and the crashdump kernel…
>> 
>> I want to make sure that I don’t inadvertently set it up to do looping 
>> infinitely nested kernels, etc.
>> 
>> I’m probably overthinking this, but… we’re having crashes in the field and 
>> the customers are a little riled up right now so I don’t want to spend a lot 
>> of time saying “here try this image”.  They want their smoking gun and they 
>> want it soon.
>> 
> 
> 
> ~Pratyush


_______________________________________________
kexec mailing list
[email protected]
http://lists.infradead.org/mailman/listinfo/kexec

Re: Questions about kexec-tools (resend to list)

Reply via email to