Hi! We had a problem when crmd crashed. Obviously, crmd after being restarted tried to recover, but it seems recovery is not implemented yet:
kernel: [ 2523.296059] crmd[17500]: segfault at 14 ip 0000000000418110 sp 00007fffe415d370 error 4 in crmd[400000+3a000] cib: [17496]: WARN: send_via_callback_channel: Delivery of reply to client 17500/8b364949-0abd-40cd-a0cd-8ff9ea184d02 failed corosync[17457]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process crmd terminated with signal 11 (pid=17500, core=true) crmd: [21618]: info: do_state_transition: State transition S_NOT_DC -> S_RECOVERY [ input=I_FAIL cause=C_FSA_INTERNAL origin=do_lrm_rsc_op ] crmd: [21618]: ERROR: do_recover: Action A_RECOVER (0000000001000000) not supported crmd: [21618]: ERROR: do_log: FSA: Input I_TERMINATE from do_recover() received in state S_RECOVERY crmd: [21618]: info: do_state_transition: State transition S_RECOVERY -> S_TERMINATE [ input=I_TERMINATE cause=C_FSA_INTERNAL origin=do_recover ] crmd: [21618]: info: do_exit: Performing A_EXIT_0 - gracefully exiting the CRMd Unfortunately that strategy was not successful. corosync[17457]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: crmd crmd: [28719]: info: do_state_transition: State transition S_NOT_DC -> S_RECOVERY [ input=I_FAIL cause=C_FSA_INTERNAL origin=do_lrm_rsc_op ] crmd: [28719]: ERROR: do_recover: Action A_RECOVER (0000000001000000) not supported crmd: [28719]: ERROR: do_log: FSA: Input I_TERMINATE from do_recover() received in state S_RECOVERY The game repeated for more than two hours until the other node of the two-node cluster rebooted. pengine: [17043]: WARN: pe_fence_node: Node h07 will be fenced because it is un-expectedly down Th software bind used is basically SLES11 SP1 with a newer corosync (corosync-1.4.1-0.3.3.3518.1.PTF.712037). Were there any improvements since that on the main development line of corosync/pacemaker? Regards, Ulrich _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
