* Lars Ellenberg <[EMAIL PROTECTED]> [20080322 06:42]:
> On Thu, Mar 20, 2008 at 09:59:29PM +0100, Andreas Kurz wrote:
> > On Thu, Mar 20, 2008 at 6:43 PM, Jean-Francois Malouin
> > <[EMAIL PROTECTED]> wrote:
> > > Hi Dominik,
> > >
> > >  * Dominik Klein <[EMAIL PROTECTED]> [20080320 02:23]:
> > >
> > > > Jean-Francois Malouin wrote:
> > >  > >I thought I had it nailed but still no go.
> > >
> > >  [...]
> > >
> > >  I'm replying late. It's that kind of a day:
> > >  network failure at home and power failure at work :)
> > >
> > >
> > >  > >
> > >  > The xml looks good to me.
> > >
> > >  glad to know, I'm quite new at this :)

[...]

> > >  > >op drbd_id:0_promote_0 on feeble-0: Error
> > >  >
> > >  > Find out why this failed.
> > >
> > >  Can't see why and how by just looking at the debug logs...
> > >  anyway to increase verbosity in there?
> > 
> > Add 'debug 1' to your ha.cf to increase verbosity. As a side note ...
> > when using a serial heartbeat channel in combination with crm, use a
> > baud rate as high as possible e.g. 115200 or higher.
> 
> chances are that you use dopd, with heartbeat 2.1.3.
> unfortunately while fixing other shortcomings of dopd in 2.1.2,
> we (LinBit) broke the real failover behaviour.

as a matter of fact, yes, I do as per the drbd user guide.

> indication: in the kernel logs would be something like
>  "drbd: refusing to be primary while peer is not outdated".
> easy way to find out: configure "fencing ignore" in drbd,
> and see if heartbeat then behaves as expected.

I haven't seen any kernel messages as the above and it seems to me
that dopd behaves (running Etch and 2.6.22.2-i686-smp). I tested by
bringing down the replication link and if I dd a big file on the
backing device on the primary I can see the device syncing happening
when the link is brought up again and the connection state changes
back to 'Connected'. 

I think I have found out my problem though: I didn't put the resource
location stuff for pingd. I added this snippet to the CIB to constrain
the master-slave drbd resource to not run on a node with lost
connectivity and so far in my tests it seems to work:

<rsc_location id="drbd_id:connected" rsc="ms-drbd_id">
  <rule id="drbd_id:connected:rule" score="-INFINITY" boolean_op="or">
    <expression id="drbd_id:connected:expr:undefined" attribute="pingd" 
operation="not_defined"/>
    <expression id="drbd_id:connected:expr:zero" attribute="pingd" 
operation="lte" value="0"/>
  </rule>
</rsc_location>

and when I put the master in stanby the resources are correctly
migrated. Same goes when I poweroff the master or yank the eth0
network cable. I still haver issues about failover as it seems that
'auto_failback off' is not honored correctly.

I'm not sure what to do next. Should I move to pacemaker?

thanks to all
jf

> for more information, workarounds and status of the fix
> please have a look at the thread around
>  http://thread.gmane.org/gmane.linux.network.drbd/14345/focus=14372
> 
> sory for puting out broken software.  blame is on me, I did not double
> check QA, which would have been my job before committing anything.
> 
> the fix we have pending undergoes thorough regression testing this time.
> 
> -- 
> : Lars Ellenberg                            Tel +43-1-8178292-55 :
> : LINBIT Information Technologies GmbH      Fax +43-1-8178292-82 :
> : Vivenotgasse 48, A-1120 Vienna/Europe    http://www.linbit.com :
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

-- 
<° ><
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to