On Fri, Jun 09, 2006 at 06:47:59AM +0800, Zou Nan hai wrote: > On Thu, 2006-06-08 at 16:35, Horms wrote: > > On Thu, Jun 08, 2006 at 06:48:23AM +0800, Zou Nan hai wrote: > > > The ia64 kdump patch is in 2 parts. > > > > > > the kexec-kdump-ia64-2.6.16.patch should apply on top of the previous > > > kexec patch by Khalid in Tony's test tree. > > > > > > the kexec-tools-kdump-ia64.patch should apply to kexec-tools-1.101 > > > with kexec-tools-1.101-kdump.patch > > > > > > > > > To test it. > > > Build first SMP kernel with KEXEC and KDUMP enabled. > > > > > > Boot it with kernel parameter "[EMAIL PROTECTED]" > > > means reserver XXX from YYY for crashdumping. > > > Build an UP kernel with KEXEC KDUMP VMCORE enabled. > > > load this kernel as a crashdumping kernel > > > kexec -p vmlinux.gz --initrd=initrd --append="...." > > > > > > trigger a crash, > > > maybe "echo c > /proc/sysrq-trigger" > > > after the crash kernel boots, > > > cp /proc/vmcore core > > > > > > gdb first_kernel_vmlinux core > > > > > > please test and review. > > > > > > Signed-off-by: Khalid Aziz <[EMAIL PROTECTED]> > > > Signed-off-by: Zou Nan hai <[EMAIL PROTECTED]> > > > > Hi, > > > > I'm very excited to be able to play with the new version of this patch, > > but the version you posted seems to included include all the kexec patch > > that went into Tony Luck's tree. Here is a rediff relative to the > > existing kexec patch (no other changes). > > > > The code does seem to be working for me. The main difficulty so far > > seems to have been finding an appropriate place and size and place for > > the reserved area. [EMAIL PROTECTED] seems to work for me, offering enough > > memory and not lie on a resource boundry for me. > > > > Lastly, is it possible for you to comment on what areas of concern > > you have with regards to kdump/kexec on ia64. I am looking to port this > > code to xen, as my colleague Magnus Damm and I have already done so for i386 > > (complete) and x86_64 (almost complete). > > > > http://lists.xensource.com/archives/html/xen-devel/2006-05/msg01272.html > > > > Signed-Off-By: Horms <[EMAIL PROTECTED]> > > > > Thanks for testing and review. > > There is still a lot of work to do for ia64 Kdump to be a very useful > and robust feature. > > Major issues. > 1. Full percpu dumping on INIT. > You may notices I only send an IPI to user CPUs and dump part of > registers for crashing CPU.Just stop other CPUs, not dumping their > status. This is only a temp hack. > On other platforms they did this by an NMI, on IA64 we should use INIT > to acknowledge other CPUs. And I know on some platform there is a > trigger on panel can trigger INIT. We could use that to dump at the time > of deadlock. But currently INIT is used by MCA, we need to find a way to > coordinate with MAC on INIT. > > 2. unwind section is missing in vmcore. > When you do a readelf on vmcore, you may notice there is no unwind > sections. We should add this percpu stack unwind sections to help dump > filter tools to analize the core dump. > > 3. kdump path at crash time. > Currently I still have to do a irq->end on each level triggered irq, > without that the MPT fusion driver can not restart. We should fix this, > at least do that in a way of not touching any memory in previous kernel. > > 4. Other than this, we need port the dump filter to IA64. > > There are still some minor issues. > e.g > When I get a crash when X is active, the new kernel will startup in a > blank screen(network is still working). I have indeed do a brute force > VGA reset on in purgatory code. But that seems to only shutdown the VGA > but not reinit it if X is running. > > Current kexec can't not run on a kexec'd kernel, that is because the > memory region of EFI memmap is not reserverd in /proc/iomem, I will sent > a patch to reserve that region later. > > There should be other issues and gaps need to find out.
Thanks for that list, it is very useful to me. I hope that I can find some time to help with some of those problems. One thing that I am puzzling over is why you shutdown the PCI devices as part of machine_crash_shutdown(). As I am trying to port your code to xen this is quite a problem for me, as I'm not sure that Xen actually knows enough about PCI to do this. Its it a problem relating to bringing the devices back online after a reboot? Is it the MPT fusion problem you mention above? -- Horms H: http://www.vergenet.net/~horms/ W: http://www.valinux.co.jp/en/ _______________________________________________ fastboot mailing list [email protected] https://lists.osdl.org/mailman/listinfo/fastboot
