Hi,
I wonder whether this is some programming problem:
Jul 19 12:52:47 h03 corosync[14584]: [MAIN ] Corosync Cluster Engine
('1.3.1'): started and ready to provide service.
[...]
Jul 19 12:52:47 h03 corosync[14584]: [pcmk ] info: send_member_notification:
Sending membership update 1024 to 0 children
Jul 19 12:52:47 h03 corosync[14584]: [pcmk ] WARN: route_ais_message: Sending
message to local.crmd failed: ipc delivery failed (rc=-2)
Jul 19 12:52:47 h03 corosync[14584]: [TOTEM ] Marking ringid 1 interface
10.2.2.3 FAULTY - administrative intervention required.
[...]
Jul 19 12:52:47 h03 attrd: [14624]: info: main: Cluster connection active
Jul 19 12:52:47 h03 attrd: [14624]: info: main: Accepting attribute updates
Jul 19 12:52:47 h03 attrd: [14624]: notice: main: Starting mainloop...
Jul 19 12:52:47 h03 stonith-ng: [14621]: info: init_ais_connection_classic:
AIS connection established
[...]
Jul 19 12:52:48 h03 crmd: [14626]: info: do_started: Delaying start, no
membership data (0000000000100000)
Jul 19 12:52:48 h03 crmd: [14626]: info: crmd_init: Starting crmd's mainloop
Jul 19 12:52:48 h03 crmd: [14626]: notice: ais_dispatch_message: Membership
1024: quorum acquired
To me it looks as if pacemaker tried to send a message to nobody (0 children).
That in turn seemed to cause an error, which in turn set an interface to faulty.
This situation was examined when the node booted (and was expected to join two
other nodes of a three-node cluster). We see FAULTY interfaces now and then,
but wonder what condition causes that.
Software is that of SLES11 SP1 (openais-1.1.4-5.6.3)
Regards,
Ulrich
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems