On Thu Dec 14 07:07:23 PST 2006, Khalid Aziz wrote:
> Vivek Goyal wrote:
> > On Thu, Dec 14, 2006 at 11:41:55AM +1000, Bradley Schatz wrote:
> > > Hi Vivek, Eric and Hari,
> > >
> > > Sorry to mass mail you all - it wasnt clear what the best forum to 
> talk
> > > about kdump is.
> >
> > Hi Bradley,
> >
> > You can do discussion regarding kdump on fastboot mailing list. 
> Copying
> > this thread to mailing list.
> >
> > >
> > > I read a thread in LKML recently discussing kdump and quiescing 
> devices such
> > > as networking cards on bootup of the kdump kernel.
> > >
> > > What changes did you end up implementing to take care of this?
> > >
> >
> > So far we have been fixing individual drivers. Basically harden the 
> drivers
> > to be able to initialize in a hostile environment where underlying 
> device
> > might be in unknown state. We have introduced a kernel command line
> > parameter "reset_devices" and device drivers can make use of this
> > parameter to determine that they are initializing in a potentially 
> hostile
> > environment and they need to first try to reset the device and then
> > go ahead with rest of the initialization.
>
> Even though I agree with the philosophy that drivers should be able to
> initialize devices in potentially hostile environment (and
> "reset_device" should ultimately become unnecessary), there is still
> some need to quiesce the devices before kexec'ing a new kernel. We are
> already seeing problems with in-flight DMAs from non-quiesced devices
> when a hardware I/O TLB is involved. It just makes me nervous to 
> attempt
> kexec a new kernel not knowing the system has quiet down. At the very
> least, we should turn bus mastering off on all PCI devices to disable
> any DMAs from them before we attempt kexec.
>

I wanted to point out the PowerPC 64 bit kernel behavior in this regard.

As Eric mentioned, a fundimental assumption of kdump is the first kernel
and what it knows is unreliable, we can not call its shutdown functions,
and DMAs will still be progressing.  The crash dump kernel memory is
safe because it was always allocated and therefore there are no DMAs
in progress to it.

If the new kernel is a crash reboot kernel, instead of clearing the
iommu, we search it for free entries.  Any entry that is not free
is marked busy and will not be allocated by the DMA apis.  The
hardware IOMMU tables are either in the hypervisor or are excluded
from any kexec allocation, so we naturally reuse the same hardware 
tables
and instead of clearing them, we mark the entrys in use to the 
allacator.

In theory this could mean a device has no iommu resources available to
it, but I haven't heard of a case occurring.   It may stress the
driver scatter-gather handling, or require fewer DMAs in flight.

milton

_______________________________________________
fastboot mailing list
[email protected]
https://lists.osdl.org/mailman/listinfo/fastboot

Reply via email to