Hi,

How can I reconcile the need to have Kdump configured and operational on 
cluster nodes with the need for fencing of a node most commonly and 
conveniently implemented through iLO on HP servers?

Customers require Kdump configured and operational to be able to have kernel 
crashes analysed by Red Hat support. The taking of crash dump starts 
immediately after the crash, but it may take very considerable time on a 
machine with 512 GB of memory (more than an hour) if done in dumplevel 0 and 
over 1 GBE network.  However, if I use iLO fencing then the crashed node will 
be powered off through iLO which will irrecovably kill the the kernel dump in 
progress and erase the memory content containing the crashed kernel image.

Ideally, I would love to have the functionality that is present in several UNIX 
clusters, when a crashed node completes its kernel crash dump in peace.  In 
UNIX clusters the crashed node can be configured to reboot automatically after 
kernel crash and rejoin the cluster.  It typically does the kernel dump as a 
part of the boot.

The UNIX clusters typically use SCSI reservation to protect integrity of 
storage. This enables them to keep the failed node isolated whilst it is still 
able to do the kernel crash dump before rejoining the cluster.  I believe this 
option is not avilable in Linux Cluster.

So, how can I have functioning Linux cluster with ability of taking a kernel 
crash dump of crashed nodes and without blocking the access to shared GFS2 
filesystem for the hour or so that bit may take a crash dump obn a very large 
system?

Thanks and regards,

Chris Jankowski

--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster

Reply via email to