On Wed, Dec 29, 2010 at 10:02:41AM -0700, Greg Woods wrote:
> On Wed, 2010-12-29 at 12:56 +0100, Dejan Muhamedagic wrote:
> 
> > > Dec 28 09:19:18 vmserve.scd.ucar.edu crmd: [7518]: info: do_lrm_rsc_op:
> > > Performing key=21:2:0:fb701221-ba59-4de8-88dc-032cab9ec090
> > > op=vmgroup1:0_stop_0 )
> > > Dec 28 09:19:18 vmserve.scd.ucar.edu lrmd: [7514]: info:
> > > rsc:vmgroup1:0:30: stop
> > > Dec 28 09:19:18 vmserve.scd.ucar.edu crmd: [7518]: info: do_lrm_rsc_op:
> > > Performing key=50:2:0:fb701221-ba59-4de8-88dc-032cab9ec090
> > > op=vmgroup2:0_stop_0 )
> > > Dec 28 09:19:18 vmserve.scd.ucar.edu lrmd: [7514]: info:
> > > rsc:vmgroup2:0:31: stop
> > > Dec 28 09:19:18 vmserve.scd.ucar.edu lrmd: [7514]: WARN: Managed
> > > vmgroup1:0:stop process 8088 exited with return code 6.
> > > Dec 28 09:19:18 vmserve.scd.ucar.edu crmd: [7518]: info:
> > > process_lrm_event: LRM operation vmgroup1:0_stop_0 (call=30, rc=6,
> > > cib-update=36, confirmed=true) not configured
> > > Dec 28 09:19:18 vmserve.scd.ucar.edu lrmd: [7514]: WARN: Managed
> > > vmgroup2:0:stop process 8089 exited with return code 6.
> > 
> > No messages from the drbd RA?
> 
> Nothing that I can see. It looks, however, like the same kind of error
> is occurring with many or all of the resources. I have attached the
> complete halog entries for the time period in question.
> 
> > This smells like a bug found in 1.0.9 which should've been
> > fixed a while ago:
> > 
> > http://developerbugs.linux-foundation.org/show_bug.cgi?id=2458
> 
> After reading that report, it doesn't look like the same problem to me,
> but I will freely admit that the logs are hard for me to interpret.

Welcome to the club.

> There are entries like this showing what appear to be the correct
> parameters:
> 
> 
> Dec 28 09:19:13 vmserve.scd.ucar.edu lrmd: [7514]: notice: max_child_count 
> (4) reached, postponing execution of operation monitor[10] on 
> ocf::LVM::DRBDVG0 for client 7518, its parameters: volgrpname=[DRBDVG0] 
> CRM_meta_timeout=[20000] crm_feature_set=[3.0.1]  by 1000 ms

OK. Judging by the logs the issue seems to occur exclusively in
the stop operation, so it's probably some variation of bug #2458.

> > If it's not a resource problem (i.e. drbd), please either reopen
> > the bugzilla above or open a new one if it looks like a different
> > problem. Don't forget to attach hb_report.
> 
> If you don't see anything obvious in the attached more complete log, I
> will gladly do so. In the meantime, I may have to downgrade pacemaker so
> that I can get my cluster back. We are running in non-HA mode right now.

If you could run the node in the debug mode before generating
hb_report that would perhaps help.

Thanks,

Dejan

> --Greg
> 
> 
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to