Hi, On Mon, Nov 23, 2009 at 12:09:02PM +0100, Matteo Chesi wrote: > Dejan Muhamedagic ha scritto: > > Hi, > > > > On Mon, Nov 23, 2009 at 11:28:24AM +0100, Matteo Chesi wrote: > >> Andrew Beekhof ha scritto: > >>> On Mon, Nov 23, 2009 at 10:31 AM, Matteo Chesi <[email protected]> wrote: > >>>> Hi, > >>>> > >>>> I've got one problem on Heartbeat in one of my production clusters. > >>>> > >>>> The problem is that one resource (scs-mysql) in a group of resources do > >>>> not respond to start/stop commands through hb_gui (or crm_resource > >>>> commands). > >>>> > >>>> I checked "crm_verify -L" and I found this problem: > >>>> > >>>> scs02:~# crm_verify -L > >>>> crm_verify[28132]: 2009/11/23_10:26:37 ERROR: unpack_rsc_op: Remapping > >>>> resource_scs_postgresql_monitor_0 (rc=1) on scs01 to an ERROR > >>>> > >>>> > >>>> Please could you help me to find out what's the problem ? > >>> Your postgres resource returned the wrong thing. Check your logs. > >> I've got 3 logs and none of them tell me more than the message I posted. > > > > Perhaps the logs were rotated after the error happened? Try to > > grep your logs for lrmd.*postgres. > > > > The error happens everytime I do a "crm_verify -L" and gets the > timestamp of that moment.
crm_verify only reports here whatever is recorded in the status section of the CIB. That error happened in the past though and has nothing to do with crm_verify. > However looking for that string in /var/log I found that one past log > shows something related my error and a particular event seems to be > happened before this error started to repeat ... > > pengine[9937]: 2009/10/28_14:50:20 WARN: unpack_rsc_op: Processing > failed op resource_scs_postgresql_monitor_0 on scs01: Error > pengine[9937]: 2009/10/28_14:50:20 ERROR: native_add_running: Resource > lsb::scs-postgresql:resource_scs_postgresql appears to be active on 2 nodes. The lsb script is probably not LSB compliant (or indeed there is postgres running on both nodes). You should be better off with the OCF resource agent. Thanks, Dejan > pengine[9937]: 2009/10/28_14:50:20 ERROR: See > http://linux-ha.org/v2/faq/resource_too_active for more information. > pengine[9937]: 2009/10/28_14:50:20 ERROR: native_create_actions: > Attempting recovery of resource resource_scs_postgresql > pengine[9937]: 2009/10/28_14:50:21 ERROR: process_pe_message: Transition > 94: ERRORs found during PE processing. PEngine Input stored in: > /var/lib/heartbeat/pengine/pe-error-28.bz2 > mgmtd[5279]: 2009/10/28_14:50:21 ERROR: unpack_rsc_op: Remapping > resource_scs_postgresql_monitor_0 (rc=1) on scs01 to an ERROR > mgmtd[5279]: 2009/10/28_14:50:21 ERROR: native_add_running: Resource > lsb::scs-postgresql:resource_scs_postgresql appears to be active on 2 nodes. > mgmtd[5279]: 2009/10/28_14:50:21 ERROR: See > http://linux-ha.org/v2/faq/resource_too_active for more information. > mgmtd[5279]: 2009/10/28_14:50:22 ERROR: unpack_rsc_op: Remapping > resource_scs_postgresql_monitor_0 (rc=1) on scs01 to an ERROR > mgmtd[5279]: 2009/10/28_14:50:22 ERROR: unpack_rsc_op: Remapping > resource_scs_postgresql_monitor_0 (rc=1) on scs01 to an ERROR > mgmtd[5279]: 2009/10/28_14:50:23 ERROR: unpack_rsc_op: Remapping > resource_scs_postgresql_monitor_0 (rc=1) on scs01 to an ERROR > mgmtd[5279]: 2009/10/28_14:50:24 ERROR: unpack_rsc_op: Remapping > resource_scs_postgresql_monitor_0 (rc=1) on scs01 to an ERROR > mgmtd[5279]: 2009/10/28_14:50:26 ERROR: unpack_rsc_op: Remapping > resource_scs_postgresql_monitor_0 (rc=1) on scs01 to an ERROR > > Looking for that error in FAQ (Pacemaker one) I found: > > Resource is Too Active > > Pacemaker will try and determine what resources are active on a machine > when it starts. To do this, it sends what we call a probe which uses the > monitor operation of your ResourceAgent. > > There are two common reasons for seeing this message: > > * Your resource really is active on more than one node > o Check you are _not_ starting it on boot > o Did Pacemaker suffer an internal failure? If so, please > check the Help:Contents page and report it > * Your resource doesn't implement the monitor operation correctly > o Make sure your Resource Agent conforms to the OCF-spec by > using the ocf-tester script > > > > > My Init script is a LSB one, not OCF. > > Any Idea to solve it ? If it is an error happened in the past could I do > some cleanup to solve it ? > > TIA, > Matteo > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
