On 4/24/07, Doug Knight <[EMAIL PROTECTED]> wrote:
I settled on using cibadmin for now to change target_role. Now, when I
add the Filesystem resource, and try to start it up, it won't start. I'm
getting the following in my debug log:

pengine[24246]: 2007/04/24_09:40:38 WARN: unpack_rsc_op: Processing
failed op (rsc_drbd_7788:0_monitor_0) for rsc_drbd_7788:0 on
arc-dknightlx
pengine[24246]: 2007/04/24_09:40:38 WARN: unpack_rsc_op: Processing
failed op (rsc_drbd_7788:1_monitor_0) for rsc_drbd_7788:1 on
arc-dknightlx
pengine[24246]: 2007/04/24_09:40:38 WARN: unpack_rsc_op: Processing
failed op (fs_mirror_start_0) for fs_mirror on arc-dknightlx
pengine[24246]: 2007/04/24_09:40:38 WARN: unpack_rsc_op: Handling failed
start for fs_mirror on arc-dknightlx
pengine[24246]: 2007/04/24_09:40:38 info: determine_online_status: Node
arc-tkincaidlx.wsicorp.com is online
pengine[24246]: 2007/04/24_09:40:38 WARN: unpack_rsc_op: Processing
failed op (rsc_drbd_7788:0_monitor_0) for rsc_drbd_7788:0 on
arc-tkincaidlx.wsicorp.com
pengine[24246]: 2007/04/24_09:40:38 WARN: unpack_rsc_op: Processing
failed op (rsc_drbd_7788:1_monitor_0) for rsc_drbd_7788:1 on
arc-tkincaidlx.wsicorp.com
pengine[24246]: 2007/04/24_09:40:38 info: clone_print: Master/Slave Set:
ms_drbd_7788
pengine[24246]: 2007/04/24_09:40:38 info: native_print:
rsc_drbd_7788:0     (heartbeat::ocf:drbd):  Master arc-dknightlx
pengine[24246]: 2007/04/24_09:40:38 info: native_print:
rsc_drbd_7788:1     (heartbeat::ocf:drbd):  Slave
arc-tkincaidlx.wsicorp.com
pengine[24246]: 2007/04/24_09:40:38 info: group_print: Resource Group:
grp_pgsql_mirror
pengine[24246]: 2007/04/24_09:40:38 info: native_print:     fs_mirror
(heartbeat::ocf:Filesystem):    Stopped
pengine[24246]: 2007/04/24_09:40:38 info: master_color: Promoting
rsc_drbd_7788:0
pengine[24246]: 2007/04/24_09:40:38 info: master_color: Promoted 1
instances of a possible 1 to master
pengine[24246]: 2007/04/24_09:40:38 notice: NoRoleChange: Leave resource
rsc_drbd_7788:0        (arc-dknightlx)
pengine[24246]: 2007/04/24_09:40:38 notice: NoRoleChange: Leave resource
rsc_drbd_7788:1        (arc-tkincaidlx.wsicorp.com)
pengine[24246]: 2007/04/24_09:40:38 notice: NoRoleChange: Leave resource
rsc_drbd_7788:0        (arc-dknightlx)
pengine[24246]: 2007/04/24_09:40:38 notice: NoRoleChange: Leave resource
rsc_drbd_7788:1        (arc-tkincaidlx.wsicorp.com)
pengine[24246]: 2007/04/24_09:40:38 info: master_color: Promoted 1
instances of a possible 1 to master
pengine[24246]: 2007/04/24_09:40:38 info: master_color: Promoted 1
instances of a possible 1 to master
pengine[24246]: 2007/04/24_09:40:38 WARN: native_color: Resource
fs_mirror cannot run anywhere

No ocf_log debug output is getting triggered from the drbd ocf script,
yet HA is saying something about "Processing failed op" on what looks
like a monitor command. Can anyone tell me what this means?

the failed *_monitor_0 ops basically mean that the resource reported
itself running when the node started - this is usually indicative of
either a bug in the RA or the resource being started from init.
certainly if the RA reported running when it wasnt, then that could
easily cause resources that depend on the broken one to fail.

a failed *_start_0 is quite serious, the resource will never be
started on that node again until you clean it up manually:
http://linux-ha.org/v2/faq/manual_recovery

a failed *_stop_0 is even worse since we can't start the resource
anywhere else until it is stopped in its current location.  if stonith
is enabled we will shoot the node and continue, if not we'll wait
until you clean up the node manually (see link above)

Also, I have
not explicitly defined operations on any of my resources yet, could that
be part of the problem?

no, for most people the defaults are enough and the only thing they
need to specify is the recurring monitors.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to