On 2008-08-13T17:11:54, Keisuke MORI <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> I've got an unexpected behavior during our regression test
> for the 2.1.4 release.
> 
> When the stop of a resource with on_fail=block failed, it looks
> like the resource is running on the both nodes according to the
> log and crm_mon.
> 
> In 2.1.3 it didn't happen and had been working fine as expected,
> and the problem occurs in the current lha-2.1 (0d61ad37ee9a)

Yeah, this looks like a real bug.

Aug 13 15:59:15 sunnyvale pengine: [18478]: notice: unpack_config: On loss of 
CCM Quorum: Ignore
Aug 13 15:59:15 sunnyvale pengine: [18478]: info: unpack_nodes: Node cupertino 
is in standby-mode
Aug 13 15:59:15 sunnyvale pengine: [18478]: info: determine_online_status: Node 
sunnyvale is online
Aug 13 15:59:15 sunnyvale pengine: [18478]: info: determine_online_status: Node 
cupertino is standby
Aug 13 15:59:15 sunnyvale pengine: [18478]: info: common_apply_stickiness: 
Setting failure stickiness for group1-dummy1 on cupertino: -1000000
Aug 13 15:59:15 sunnyvale pengine: [18478]: info: unpack_rsc_op: Remapping 
group1-dummy1_stop_0 (rc=1) on cupertino to an ERROR (expected 0)
Aug 13 15:59:15 sunnyvale pengine: [18478]: WARN: unpack_rsc_op: Processing 
failed op group1-dummy1_stop_0 on cupertino: Error
Aug 13 15:59:15 sunnyvale pengine: [18478]: notice: group_print: Resource 
Group: non_clone_group1
Aug 13 15:59:15 sunnyvale pengine: [18478]: notice: native_print:     
group1-dummy1     (ocf::heartbeat:Dummy1):        Started cupertino (unmanaged) 
FAILED
Aug 13 15:59:15 sunnyvale pengine: [18478]: notice: native_print:     
group1-dummy2     (ocf::heartbeat:Dummy2):        Stopped 
Aug 13 15:59:15 sunnyvale pengine: [18478]: WARN: custom_action: Action 
group1-dummy1_start_0 (unmanaged)
Aug 13 15:59:15 sunnyvale pengine: [18478]: notice: NoRoleChange: Leave 
resource group1-dummy1  (Started sunnyvale)
Aug 13 15:59:15 sunnyvale pengine: [18478]: WARN: custom_action: Action 
group1-dummy1_stop_0 (unmanaged)
Aug 13 15:59:15 sunnyvale pengine: [18478]: WARN: custom_action: Action 
group1-dummy1_start_0 (unmanaged)
Aug 13 15:59:15 sunnyvale pengine: [18478]: notice: StartRsc:  sunnyvale        
Start group1-dummy1
Aug 13 15:59:15 sunnyvale pengine: [18478]: WARN: custom_action: Action 
group1-dummy1_start_0 (unmanaged)
Aug 13 15:59:15 sunnyvale pengine: [18478]: notice: RecurringOp:  Start 
recurring monitor (5s) for group1-dummy1 on sunnyvale
Aug 13 15:59:15 sunnyvale pengine: [18478]: notice: StartRsc:  sunnyvale        
Start group1-dummy2
Aug 13 15:59:15 sunnyvale pengine: [18478]: notice: RecurringOp:  Start 
recurring monitor (5s) for group1-dummy2 on sunnyvale

Following the failure to stop, the resource is considered unmanaged +
failed (which is correct). 

Aug 13 15:59:15 sunnyvale pengine: [18478]: notice: NoRoleChange: Leave 
resource group1-dummy1  (Started sunnyvale)

Is the crucial line; it's never been started there, this is where the
bug begins.

In response to it being started there, it then starts spawning monitors
etc, which is of course incorrect.


Regards,
    Lars

-- 
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to