>>> Andrew Beekhof <[email protected]> schrieb am 30.03.2012 um 00:57 in 
>>> Nachricht
<caedlwg3dqfffq+bgbdqc-fexjo9rhuujtts8_wryn_-kzed...@mail.gmail.com>:
> On Thu, Mar 29, 2012 at 8:31 PM, Ulrich Windl
> <[email protected]> wrote:
> > Hi!
> >
> > We had a problem when crmd crashed. Obviously, crmd after being restarted 
> tried to recover, but it seems recovery is not implemented yet:
> 
> Recovery is implemented, just not graceful recovery without a restart
> of the process.  Which is what you're seeing in the logs.

I'm specifically referring to "crmd: [21618]: ERROR: do_recover: Action 
A_RECOVER (0000000001000000) not supported". And crmd obviously was restarted, 
because the previous PID was 17500.

Regards,
Ulrich


> The underlying cause however is that the lrmd, specifically the call
> we make below, isn't behaving as expected.
> 
>     call_id = rsc->ops->perform_op(rsc, op);
> 
>     if (call_id <= 0) {
>         crm_err("Operation %s on %s failed: %d", operation, rsc->id, call_id);
>         register_fsa_error(C_FSA_INTERNAL, I_FAIL, NULL);
> ...
> 
> > kernel: [ 2523.296059] crmd[17500]: segfault at 14 ip 0000000000418110 sp 
> 00007fffe415d370 error 4 in crmd[400000+3a000]
> > cib: [17496]: WARN: send_via_callback_channel: Delivery of reply to client 
> 17500/8b364949-0abd-40cd-a0cd-8ff9ea184d02 failed
> > corosync[17457]:  [pcmk  ] ERROR: pcmk_wait_dispatch: Child process crmd 
> terminated with signal 11 (pid=17500, core=true)
> > crmd: [21618]: info: do_state_transition: State transition S_NOT_DC -> 
> S_RECOVERY [ input=I_FAIL cause=C_FSA_INTERNAL origin=do_lrm_rsc_op ]
> > crmd: [21618]: ERROR: do_recover: Action A_RECOVER (0000000001000000) not 
> supported
> > crmd: [21618]: ERROR: do_log: FSA: Input I_TERMINATE from do_recover() 
> received in state S_RECOVERY
> > crmd: [21618]: info: do_state_transition: State transition S_RECOVERY -> 
> S_TERMINATE [ input=I_TERMINATE cause=C_FSA_INTERNAL origin=do_recover ]
> > crmd: [21618]: info: do_exit: Performing A_EXIT_0 - gracefully exiting the 
> CRMd
> >
> > Unfortunately that strategy was not successful.
> > corosync[17457]: [pcmk  ] notice: pcmk_wait_dispatch: Respawning failed 
> child process: crmd
> > crmd: [28719]: info: do_state_transition: State transition S_NOT_DC -> 
> S_RECOVERY [ input=I_FAIL cause=C_FSA_INTERNAL origin=do_lrm_rsc_op  ]
> > crmd: [28719]: ERROR: do_recover: Action A_RECOVER (0000000001000000) not 
> supported
> > crmd: [28719]: ERROR: do_log: FSA: Input I_TERMINATE from do_recover() 
> received in state S_RECOVERY
> >
> > The game repeated for more than two hours until the other node of the 
> two-node cluster rebooted.
> >
> > pengine: [17043]: WARN: pe_fence_node: Node h07 will be fenced because it 
> is un-expectedly down
> >
> > Th software bind used is basically SLES11 SP1 with a newer corosync 
> (corosync-1.4.1-0.3.3.3518.1.PTF.712037). Were there any improvements since 
> that on the main development line of corosync/pacemaker?
> >
> > Regards,
> > Ulrich
> >
> >
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected] 
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha 
> > See also: http://linux-ha.org/ReportingProblems 
> _______________________________________________
> Linux-HA mailing list
> [email protected] 
> http://lists.linux-ha.org/mailman/listinfo/linux-ha 
> See also: http://linux-ha.org/ReportingProblems 
> 

 
 

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to