On 2008-08-13T17:11:54, Keisuke MORI <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I've got an unexpected behavior during our regression test
> for the 2.1.4 release.
>
> When the stop of a resource with on_fail=block failed, it looks
> like the resource is running on the both nodes according to the
> log and crm_mon.
>
> In 2.1.3 it didn't happen and had been working fine as expected,
> and the problem occurs in the current lha-2.1 (0d61ad37ee9a)
Yeah, this looks like a real bug.
Aug 13 15:59:15 sunnyvale pengine: [18478]: notice: unpack_config: On loss of
CCM Quorum: Ignore
Aug 13 15:59:15 sunnyvale pengine: [18478]: info: unpack_nodes: Node cupertino
is in standby-mode
Aug 13 15:59:15 sunnyvale pengine: [18478]: info: determine_online_status: Node
sunnyvale is online
Aug 13 15:59:15 sunnyvale pengine: [18478]: info: determine_online_status: Node
cupertino is standby
Aug 13 15:59:15 sunnyvale pengine: [18478]: info: common_apply_stickiness:
Setting failure stickiness for group1-dummy1 on cupertino: -1000000
Aug 13 15:59:15 sunnyvale pengine: [18478]: info: unpack_rsc_op: Remapping
group1-dummy1_stop_0 (rc=1) on cupertino to an ERROR (expected 0)
Aug 13 15:59:15 sunnyvale pengine: [18478]: WARN: unpack_rsc_op: Processing
failed op group1-dummy1_stop_0 on cupertino: Error
Aug 13 15:59:15 sunnyvale pengine: [18478]: notice: group_print: Resource
Group: non_clone_group1
Aug 13 15:59:15 sunnyvale pengine: [18478]: notice: native_print:
group1-dummy1 (ocf::heartbeat:Dummy1): Started cupertino (unmanaged)
FAILED
Aug 13 15:59:15 sunnyvale pengine: [18478]: notice: native_print:
group1-dummy2 (ocf::heartbeat:Dummy2): Stopped
Aug 13 15:59:15 sunnyvale pengine: [18478]: WARN: custom_action: Action
group1-dummy1_start_0 (unmanaged)
Aug 13 15:59:15 sunnyvale pengine: [18478]: notice: NoRoleChange: Leave
resource group1-dummy1 (Started sunnyvale)
Aug 13 15:59:15 sunnyvale pengine: [18478]: WARN: custom_action: Action
group1-dummy1_stop_0 (unmanaged)
Aug 13 15:59:15 sunnyvale pengine: [18478]: WARN: custom_action: Action
group1-dummy1_start_0 (unmanaged)
Aug 13 15:59:15 sunnyvale pengine: [18478]: notice: StartRsc: sunnyvale
Start group1-dummy1
Aug 13 15:59:15 sunnyvale pengine: [18478]: WARN: custom_action: Action
group1-dummy1_start_0 (unmanaged)
Aug 13 15:59:15 sunnyvale pengine: [18478]: notice: RecurringOp: Start
recurring monitor (5s) for group1-dummy1 on sunnyvale
Aug 13 15:59:15 sunnyvale pengine: [18478]: notice: StartRsc: sunnyvale
Start group1-dummy2
Aug 13 15:59:15 sunnyvale pengine: [18478]: notice: RecurringOp: Start
recurring monitor (5s) for group1-dummy2 on sunnyvale
Following the failure to stop, the resource is considered unmanaged +
failed (which is correct).
Aug 13 15:59:15 sunnyvale pengine: [18478]: notice: NoRoleChange: Leave
resource group1-dummy1 (Started sunnyvale)
Is the crucial line; it's never been started there, this is where the
bug begins.
In response to it being started there, it then starts spawning monitors
etc, which is of course incorrect.
Regards,
Lars
--
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/