Hi,

On Mon, Nov 23, 2009 at 12:09:02PM +0100, Matteo Chesi wrote:
> Dejan Muhamedagic ha scritto:
> > Hi,
> > 
> > On Mon, Nov 23, 2009 at 11:28:24AM +0100, Matteo Chesi wrote:
> >> Andrew Beekhof ha scritto:
> >>> On Mon, Nov 23, 2009 at 10:31 AM, Matteo Chesi <[email protected]> wrote:
> >>>> Hi,
> >>>>
> >>>> I've got one problem on Heartbeat in one of my production clusters.
> >>>>
> >>>> The problem is that one resource (scs-mysql) in a group of resources do
> >>>> not respond to start/stop commands through hb_gui (or crm_resource
> >>>> commands).
> >>>>
> >>>> I checked "crm_verify -L" and I found this problem:
> >>>>
> >>>> scs02:~# crm_verify -L
> >>>> crm_verify[28132]: 2009/11/23_10:26:37 ERROR: unpack_rsc_op: Remapping
> >>>> resource_scs_postgresql_monitor_0 (rc=1) on scs01 to an ERROR
> >>>>
> >>>>
> >>>> Please could you help me to find out what's the problem ?
> >>> Your postgres resource returned the wrong thing. Check your logs.
> >> I've got 3 logs and none of them tell me more than the message I posted.
> > 
> > Perhaps the logs were rotated after the error happened? Try to
> > grep your logs for lrmd.*postgres.
> > 
> 
> The error happens everytime I do a "crm_verify -L" and gets the
> timestamp of that moment.

crm_verify only reports here whatever is recorded in the status
section of the CIB. That error happened in the past though and
has nothing to do with crm_verify.

> However looking for that string in /var/log I found that one past log
> shows something related my error and a particular event seems to be
> happened before this error started to repeat ...
> 
> pengine[9937]: 2009/10/28_14:50:20 WARN: unpack_rsc_op: Processing
> failed op resource_scs_postgresql_monitor_0 on scs01: Error
> pengine[9937]: 2009/10/28_14:50:20 ERROR: native_add_running: Resource
> lsb::scs-postgresql:resource_scs_postgresql appears to be active on 2 nodes.

The lsb script is probably not LSB compliant (or indeed there is
postgres running on both nodes). You should be better off with
the OCF resource agent.

Thanks,

Dejan

> pengine[9937]: 2009/10/28_14:50:20 ERROR: See
> http://linux-ha.org/v2/faq/resource_too_active for more information.
> pengine[9937]: 2009/10/28_14:50:20 ERROR: native_create_actions:
> Attempting recovery of resource resource_scs_postgresql
> pengine[9937]: 2009/10/28_14:50:21 ERROR: process_pe_message: Transition
> 94: ERRORs found during PE processing. PEngine Input stored in:
> /var/lib/heartbeat/pengine/pe-error-28.bz2
> mgmtd[5279]: 2009/10/28_14:50:21 ERROR: unpack_rsc_op: Remapping
> resource_scs_postgresql_monitor_0 (rc=1) on scs01 to an ERROR
> mgmtd[5279]: 2009/10/28_14:50:21 ERROR: native_add_running: Resource
> lsb::scs-postgresql:resource_scs_postgresql appears to be active on 2 nodes.
> mgmtd[5279]: 2009/10/28_14:50:21 ERROR: See
> http://linux-ha.org/v2/faq/resource_too_active for more information.
> mgmtd[5279]: 2009/10/28_14:50:22 ERROR: unpack_rsc_op: Remapping
> resource_scs_postgresql_monitor_0 (rc=1) on scs01 to an ERROR
> mgmtd[5279]: 2009/10/28_14:50:22 ERROR: unpack_rsc_op: Remapping
> resource_scs_postgresql_monitor_0 (rc=1) on scs01 to an ERROR
> mgmtd[5279]: 2009/10/28_14:50:23 ERROR: unpack_rsc_op: Remapping
> resource_scs_postgresql_monitor_0 (rc=1) on scs01 to an ERROR
> mgmtd[5279]: 2009/10/28_14:50:24 ERROR: unpack_rsc_op: Remapping
> resource_scs_postgresql_monitor_0 (rc=1) on scs01 to an ERROR
> mgmtd[5279]: 2009/10/28_14:50:26 ERROR: unpack_rsc_op: Remapping
> resource_scs_postgresql_monitor_0 (rc=1) on scs01 to an ERROR
> 
> Looking for that error in FAQ (Pacemaker one) I found:
> 
>   Resource is Too Active
> 
> Pacemaker will try and determine what resources are active on a machine
> when it starts. To do this, it sends what we call a probe which uses the
> monitor operation of your ResourceAgent.
> 
> There are two common reasons for seeing this message:
> 
>     * Your resource really is active on more than one node
>           o Check you are _not_ starting it on boot
>           o Did Pacemaker suffer an internal failure? If so, please
> check the Help:Contents page and report it
>     * Your resource doesn't implement the monitor operation correctly
>           o Make sure your Resource Agent conforms to the OCF-spec by
> using the ocf-tester script
> 
> 
> 
> 
> My Init script is a LSB one, not OCF.
> 
> Any Idea to solve it ? If it is an error happened in the past could I do
> some cleanup to solve it ?
> 
> TIA,
> Matteo
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to