Hi!

We had a problem when crmd crashed. Obviously, crmd after being restarted tried 
to recover, but it seems recovery is not implemented yet:

kernel: [ 2523.296059] crmd[17500]: segfault at 14 ip 0000000000418110 sp 
00007fffe415d370 error 4 in crmd[400000+3a000]
cib: [17496]: WARN: send_via_callback_channel: Delivery of reply to client 
17500/8b364949-0abd-40cd-a0cd-8ff9ea184d02 failed
corosync[17457]:  [pcmk  ] ERROR: pcmk_wait_dispatch: Child process crmd 
terminated with signal 11 (pid=17500, core=true)
crmd: [21618]: info: do_state_transition: State transition S_NOT_DC -> 
S_RECOVERY [ input=I_FAIL cause=C_FSA_INTERNAL origin=do_lrm_rsc_op ]
crmd: [21618]: ERROR: do_recover: Action A_RECOVER (0000000001000000) not 
supported
crmd: [21618]: ERROR: do_log: FSA: Input I_TERMINATE from do_recover() received 
in state S_RECOVERY
crmd: [21618]: info: do_state_transition: State transition S_RECOVERY -> 
S_TERMINATE [ input=I_TERMINATE cause=C_FSA_INTERNAL origin=do_recover ]
crmd: [21618]: info: do_exit: Performing A_EXIT_0 - gracefully exiting the CRMd

Unfortunately that strategy was not successful.
corosync[17457]: [pcmk  ] notice: pcmk_wait_dispatch: Respawning failed child 
process: crmd
crmd: [28719]: info: do_state_transition: State transition S_NOT_DC -> 
S_RECOVERY [ input=I_FAIL cause=C_FSA_INTERNAL origin=do_lrm_rsc_op  ]
crmd: [28719]: ERROR: do_recover: Action A_RECOVER (0000000001000000) not 
supported
crmd: [28719]: ERROR: do_log: FSA: Input I_TERMINATE from do_recover() received 
in state S_RECOVERY

The game repeated for more than two hours until the other node of the two-node 
cluster rebooted.

pengine: [17043]: WARN: pe_fence_node: Node h07 will be fenced because it is 
un-expectedly down

Th software bind used is basically SLES11 SP1 with a newer corosync 
(corosync-1.4.1-0.3.3.3518.1.PTF.712037). Were there any improvements since 
that on the main development line of corosync/pacemaker?

Regards,
Ulrich


_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to