On Wed, 2008-01-09 at 15:04 +0100, Alain Moulle wrote:
> Hi
> 
> Testing the CS5 on a two-nodes cluster with quorum disk, when I did
> the test ifdown on the heart-beat interface, I got a segfault in log :

> Jan  9 09:45:30 [EMAIL PROTECTED] openais[28300]: [TOTEM] entering 
> OPERATIONAL state.
> Jan  9 09:45:30 [EMAIL PROTECTED] openais[28300]: [CLM  ] got nodejoin 
> message 172.16.101.91
> Jan  9 09:45:30 [EMAIL PROTECTED] openais[28300]: [EVT  ] recovery error 
> node: r(0)
> ip(127.0.0.1)  not found
> Jan  9 09:45:30 [EMAIL PROTECTED] kernel: clurgmgrd[28359]: segfault at
> 0000000000000000 rip 0000000000408c4a rsp 00007fff04a2c450 error 4
> Jan  9 09:45:30 [EMAIL PROTECTED] gfs_controld[28328]: cluster is down, 
> exiting
> Jan  9 09:45:30 [EMAIL PROTECTED] kernel: dlm: closing connection to node 2
> Jan  9 09:45:30 [EMAIL PROTECTED] kernel: dlm: closing connection to node 0
> Jan  9 09:45:30 [EMAIL PROTECTED] kernel: dlm: closing connection to node 1
> Jan  9 09:45:30 [EMAIL PROTECTED] dlm_controld[28322]: cluster is down, 
> exiting
> Jan  9 09:45:30 [EMAIL PROTECTED] fenced[28316]: cman_get_nodes error -1 104
> Jan  9 09:45:30 [EMAIL PROTECTED] fenced[28316]: cluster is down, exiting
> Jan  9 09:45:30 [EMAIL PROTECTED] clurgmgrd[28358]: <crit> Watchdog: Daemon 
> died,
> rebooting...
> Jan  9 09:45:30 [EMAIL PROTECTED] shutdown[18377]: shutting down for system 
> halt
> 
> Is-it already a known problem ?

openais died, causing the dlm to go away and rgmanager to crash - the
"nanny" clurgmgrd process rebooted the node.

Although the segfault is probably less than ideal, the nanny process
killing the node is probably fine since the node needs to be fenced at
this point anyway.

What should of happened with rgmanager is:
* it should have seen a negative quorum transition,
* halted cluster services uncleanly, and
* wait to be fenced.

-- Lon

--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster

Reply via email to