Dejan Muhamedagic ha scritto: > Hi, > > On Mon, Nov 23, 2009 at 11:28:24AM +0100, Matteo Chesi wrote: >> Andrew Beekhof ha scritto: >>> On Mon, Nov 23, 2009 at 10:31 AM, Matteo Chesi <[email protected]> wrote: >>>> Hi, >>>> >>>> I've got one problem on Heartbeat in one of my production clusters. >>>> >>>> The problem is that one resource (scs-mysql) in a group of resources do >>>> not respond to start/stop commands through hb_gui (or crm_resource >>>> commands). >>>> >>>> I checked "crm_verify -L" and I found this problem: >>>> >>>> scs02:~# crm_verify -L >>>> crm_verify[28132]: 2009/11/23_10:26:37 ERROR: unpack_rsc_op: Remapping >>>> resource_scs_postgresql_monitor_0 (rc=1) on scs01 to an ERROR >>>> >>>> >>>> Please could you help me to find out what's the problem ? >>> Your postgres resource returned the wrong thing. Check your logs. >> I've got 3 logs and none of them tell me more than the message I posted. > > Perhaps the logs were rotated after the error happened? Try to > grep your logs for lrmd.*postgres. >
The error happens everytime I do a "crm_verify -L" and gets the timestamp of that moment. However looking for that string in /var/log I found that one past log shows something related my error and a particular event seems to be happened before this error started to repeat ... pengine[9937]: 2009/10/28_14:50:20 WARN: unpack_rsc_op: Processing failed op resource_scs_postgresql_monitor_0 on scs01: Error pengine[9937]: 2009/10/28_14:50:20 ERROR: native_add_running: Resource lsb::scs-postgresql:resource_scs_postgresql appears to be active on 2 nodes. pengine[9937]: 2009/10/28_14:50:20 ERROR: See http://linux-ha.org/v2/faq/resource_too_active for more information. pengine[9937]: 2009/10/28_14:50:20 ERROR: native_create_actions: Attempting recovery of resource resource_scs_postgresql pengine[9937]: 2009/10/28_14:50:21 ERROR: process_pe_message: Transition 94: ERRORs found during PE processing. PEngine Input stored in: /var/lib/heartbeat/pengine/pe-error-28.bz2 mgmtd[5279]: 2009/10/28_14:50:21 ERROR: unpack_rsc_op: Remapping resource_scs_postgresql_monitor_0 (rc=1) on scs01 to an ERROR mgmtd[5279]: 2009/10/28_14:50:21 ERROR: native_add_running: Resource lsb::scs-postgresql:resource_scs_postgresql appears to be active on 2 nodes. mgmtd[5279]: 2009/10/28_14:50:21 ERROR: See http://linux-ha.org/v2/faq/resource_too_active for more information. mgmtd[5279]: 2009/10/28_14:50:22 ERROR: unpack_rsc_op: Remapping resource_scs_postgresql_monitor_0 (rc=1) on scs01 to an ERROR mgmtd[5279]: 2009/10/28_14:50:22 ERROR: unpack_rsc_op: Remapping resource_scs_postgresql_monitor_0 (rc=1) on scs01 to an ERROR mgmtd[5279]: 2009/10/28_14:50:23 ERROR: unpack_rsc_op: Remapping resource_scs_postgresql_monitor_0 (rc=1) on scs01 to an ERROR mgmtd[5279]: 2009/10/28_14:50:24 ERROR: unpack_rsc_op: Remapping resource_scs_postgresql_monitor_0 (rc=1) on scs01 to an ERROR mgmtd[5279]: 2009/10/28_14:50:26 ERROR: unpack_rsc_op: Remapping resource_scs_postgresql_monitor_0 (rc=1) on scs01 to an ERROR Looking for that error in FAQ (Pacemaker one) I found: Resource is Too Active Pacemaker will try and determine what resources are active on a machine when it starts. To do this, it sends what we call a probe which uses the monitor operation of your ResourceAgent. There are two common reasons for seeing this message: * Your resource really is active on more than one node o Check you are _not_ starting it on boot o Did Pacemaker suffer an internal failure? If so, please check the Help:Contents page and report it * Your resource doesn't implement the monitor operation correctly o Make sure your Resource Agent conforms to the OCF-spec by using the ocf-tester script My Init script is a LSB one, not OCF. Any Idea to solve it ? If it is an error happened in the past could I do some cleanup to solve it ? TIA, Matteo _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
