On Thu, Sep 1, 2011 at 10:00 PM,  <[email protected]> wrote:
> Hi
>
> My release is :
> pacemaker-1.1.2-7 (on RHEL6)
> and I have checked that the patch :
> High: PE: Bug lf#2433 - No services should be stopped until probes finish
> is effectively integrated in this release.
>
> Nethertheless, it seems that I got a similar problem from time to time for
> whatever primitive: a primitive under pacemaker is flagged "failed" for
> one
> node whereas the primitive is already started on the other node. Then a
> simple cleanup on the group erase the Failure and all is fine, but
> it happens let's say within two hours when I start a loop (a robustness
> test) of migration on the group (which includes the primitive) from one
> node to the other and vice-versa with a delay of 300s between each
> migration.
>
> If I compare the logs (syslog) generated by the scenario when all is fine
> and when I got the error, the first error I found is :
> node1 daemon info lrmd [38904]: info: flush_op: process for operation
> monitor[2973] on ocf:<provider>:<scriptname>::<primitive name> for client
> 38907 still running, flush delayed
> node1 daemon debug crmd [38907]: debug: cancel_op: Op 2973 for
> <primitive-name> (<primitive-name>:2973): cancelled
>
> It seems that Pacemaker applies the stop on the primitive running on node1
> just at the moment when a monitoring is currently checking the primitive,
> so the
> monitor stop operation is delayed. The primitive stop is effective and the
> primitive starts on node2. After 20 seconds, the monitor operation on
> node1 is running again, it fails and is notfied as errorneous on node1.
> Therefore, no more switch to node1 is possible, unless a manual crm
> cleanup on the primitive is executed.
>
> Thanks for your ideas on this problem.

Sounds like a bug in the lrmd to me.  I'd say file a bug but its still
down after the LF got hacked a few weeks back :-(


> Alain
>
>
>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to