On Wed, Dec 29, 2010 at 10:02:41AM -0700, Greg Woods wrote: > On Wed, 2010-12-29 at 12:56 +0100, Dejan Muhamedagic wrote: > > > > Dec 28 09:19:18 vmserve.scd.ucar.edu crmd: [7518]: info: do_lrm_rsc_op: > > > Performing key=21:2:0:fb701221-ba59-4de8-88dc-032cab9ec090 > > > op=vmgroup1:0_stop_0 ) > > > Dec 28 09:19:18 vmserve.scd.ucar.edu lrmd: [7514]: info: > > > rsc:vmgroup1:0:30: stop > > > Dec 28 09:19:18 vmserve.scd.ucar.edu crmd: [7518]: info: do_lrm_rsc_op: > > > Performing key=50:2:0:fb701221-ba59-4de8-88dc-032cab9ec090 > > > op=vmgroup2:0_stop_0 ) > > > Dec 28 09:19:18 vmserve.scd.ucar.edu lrmd: [7514]: info: > > > rsc:vmgroup2:0:31: stop > > > Dec 28 09:19:18 vmserve.scd.ucar.edu lrmd: [7514]: WARN: Managed > > > vmgroup1:0:stop process 8088 exited with return code 6. > > > Dec 28 09:19:18 vmserve.scd.ucar.edu crmd: [7518]: info: > > > process_lrm_event: LRM operation vmgroup1:0_stop_0 (call=30, rc=6, > > > cib-update=36, confirmed=true) not configured > > > Dec 28 09:19:18 vmserve.scd.ucar.edu lrmd: [7514]: WARN: Managed > > > vmgroup2:0:stop process 8089 exited with return code 6. > > > > No messages from the drbd RA? > > Nothing that I can see. It looks, however, like the same kind of error > is occurring with many or all of the resources. I have attached the > complete halog entries for the time period in question. > > > This smells like a bug found in 1.0.9 which should've been > > fixed a while ago: > > > > http://developerbugs.linux-foundation.org/show_bug.cgi?id=2458 > > After reading that report, it doesn't look like the same problem to me, > but I will freely admit that the logs are hard for me to interpret.
Welcome to the club. > There are entries like this showing what appear to be the correct > parameters: > > > Dec 28 09:19:13 vmserve.scd.ucar.edu lrmd: [7514]: notice: max_child_count > (4) reached, postponing execution of operation monitor[10] on > ocf::LVM::DRBDVG0 for client 7518, its parameters: volgrpname=[DRBDVG0] > CRM_meta_timeout=[20000] crm_feature_set=[3.0.1] by 1000 ms OK. Judging by the logs the issue seems to occur exclusively in the stop operation, so it's probably some variation of bug #2458. > > If it's not a resource problem (i.e. drbd), please either reopen > > the bugzilla above or open a new one if it looks like a different > > problem. Don't forget to attach hb_report. > > If you don't see anything obvious in the attached more complete log, I > will gladly do so. In the meantime, I may have to downgrade pacemaker so > that I can get my cluster back. We are running in non-HA mode right now. If you could run the node in the debug mode before generating hb_report that would perhaps help. Thanks, Dejan > --Greg > > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
