On Thu, Sep 1, 2011 at 10:00 PM, <[email protected]> wrote: > Hi > > My release is : > pacemaker-1.1.2-7 (on RHEL6) > and I have checked that the patch : > High: PE: Bug lf#2433 - No services should be stopped until probes finish > is effectively integrated in this release. > > Nethertheless, it seems that I got a similar problem from time to time for > whatever primitive: a primitive under pacemaker is flagged "failed" for > one > node whereas the primitive is already started on the other node. Then a > simple cleanup on the group erase the Failure and all is fine, but > it happens let's say within two hours when I start a loop (a robustness > test) of migration on the group (which includes the primitive) from one > node to the other and vice-versa with a delay of 300s between each > migration. > > If I compare the logs (syslog) generated by the scenario when all is fine > and when I got the error, the first error I found is : > node1 daemon info lrmd [38904]: info: flush_op: process for operation > monitor[2973] on ocf:<provider>:<scriptname>::<primitive name> for client > 38907 still running, flush delayed > node1 daemon debug crmd [38907]: debug: cancel_op: Op 2973 for > <primitive-name> (<primitive-name>:2973): cancelled > > It seems that Pacemaker applies the stop on the primitive running on node1 > just at the moment when a monitoring is currently checking the primitive, > so the > monitor stop operation is delayed. The primitive stop is effective and the > primitive starts on node2. After 20 seconds, the monitor operation on > node1 is running again, it fails and is notfied as errorneous on node1. > Therefore, no more switch to node1 is possible, unless a manual crm > cleanup on the primitive is executed. > > Thanks for your ideas on this problem.
Sounds like a bug in the lrmd to me. I'd say file a bug but its still down after the LF got hacked a few weeks back :-( > Alain > > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
