On Nov 15, 2007, at 6:22 AM, Junko IKEDA wrote:
The root cause seems to be that heartbeat is not providing client
status messages (to say that the crmd processes are active) once the
split-brain heals.
crmd[1350]: 2007/11/08_10:38:43 info: join_make_offer: Peer process
on
dl380g5c is not active (yet?)
crmd[1350]: 2007/11/08_10:40:11 WARN: do_state_transition: Only 1
of 2
cluster nodes are eligible to run resources - continue 0
Because of this, the crm doesn't consider dl380g5c online and the PE
can't shut it down.
I think you need to file a bug for alan about this.
I found the similar case.
During recovering from a split brain,
one node could not join the membership after all.
crmd[6657]: 2007/11/15_14:04:11 debug: crmd_ha_msg_callback:
Ignoring HA
message (op=noop) from prec370d: not in our membership list (size=1)
according to the ccm on prec370e, prec370d really isn't part of the
cluster... what does the other node think?
looks like some sort of communications or ccm bug, if you attach the
logs from prec370d it might be possible to say which.
and loop its State transition,
from S_FINALIZE_JOIN -> S_INTEGRATION to S_INTEGRATION ->
S_FINALIZE_JOIN
and so on.
yeah, given the circumstances (conflicting data from heartbeat and the
ccm) that is to be expected unfortunately.
even worse the system was reboot for unexplained reasons...
Message from [EMAIL PROTECTED] at Thu Nov 15 14:06:03 2007 ...
prec370d heartbeat: [2572]: EMERG: Rebooting system. Reason:
/usr/lib64/heartbeat/crmd
thats alan's new suicide code in action... you'll have to take its
existence up with him
I think crmd is not the underlying cause of this case...
this case is poorly-reproducible, seems to be a matter of timing.
The logs were very big, so filed them here;
http://developerbugs.linux-foundation.org//show_bug.cgi?id=1779
Thanks,
Junko
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems