Re: [Linux-HA] HA2 OCF CRM: Manage multiple DRBD Resources

Lars Marowsky-Bree Wed, 04 Jul 2007 10:09:49 -0700

On 2007-07-04T15:04:36, Dominik Klein <[EMAIL PROTECTED]> wrote:

> 1: crm_resource -r ms-r0 -v 'started' -p target_role
> 1: crm_resource -r fs0 -v 'started' -p target_role


Sure you didn't forget a --meta here?

> <crm_mon shows r0 "started" for both nodes -> not good>
> 
> 1: drbdadm state r0
> Unknown/TOO_LARGE
> <OCF script needs to be changed to recognize this (maybe new drbd8) 
> state after just the module being loaded>
> <done>

Probably this is screwing up the initial start up probe we do. It
appears drbd8 doesn't quite work, which doesn't come as a surprise. You
will need to make a few more changes.

> So except for changing and copying the script, I started over from 
> reboot up to target_role=started for fs0
> <now crm_mon show r0:0 on acd-xen03 is master>
> <fs0 is mounted on acd-xen03>
> <2 online nodes, *4* resources>

4 resources? Weird. It might be a bit late in the game to ask this, but
which heartbeat version, exactly, are you running?

You're drbd RA seems to still put the master_slave preference into the
configuration section instead of a transient node attribute, which
indicates you're not running our latest code?


> debug: unpack_rsc_order: r0_before_fs0: ms-r0.promote after fs0.start 
> (symmetrical)
> debug: unpack_rsc_order: r1_before_fs1: ms-r1.promote after fs1.start 
> (symmetrical)
> debug: cib_native_signoff: Signing out of the CIB Service
> 
> <r1:2 looks suspicious - no idea where this comes from>

That, too, reminds me of a bug which has been fixed in the past ...


> The notify action times out (20s).

20s is low, you should increase it.

> Jul  4 14:45:59 ACD-xen03 drbd_master_slave[7406]: [7435]: DEBUG: DK 
> before crm_master -v 75
> Jul  4 14:45:59 ACD-xen03 drbd_master_slave[7406]: [7436]: DEBUG: r1: 
> Calling /usr/sbin/crm_master -v 75
> ########### notice: +20s

But true, this is weird, it should take so long.

> Please note that this behaviour is not dependant on my r0 or r1 
> resource. If I start out with r0, r0 works and r1 faults. If I start the 
> other way around with r1, then r0 will fault.
> 
> Maybe you can still help me figure this out.

Hm, I don't have a good idea of the top of my head. I'd need to try and
reproduce on my own cluster.


Regards,
    Lars

-- 
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] HA2 OCF CRM: Manage multiple DRBD Resources

Reply via email to