* Lars Ellenberg <[EMAIL PROTECTED]> [20080322 06:42]:
> On Thu, Mar 20, 2008 at 09:59:29PM +0100, Andreas Kurz wrote:
> > On Thu, Mar 20, 2008 at 6:43 PM, Jean-Francois Malouin
> > <[EMAIL PROTECTED]> wrote:
> > > Hi Dominik,
> > >
> > > * Dominik Klein <[EMAIL PROTECTED]> [20080320 02:23]:
> > >
> > > > Jean-Francois Malouin wrote:
> > > > >I thought I had it nailed but still no go.
> > >
> > > [...]
> > >
> > > I'm replying late. It's that kind of a day:
> > > network failure at home and power failure at work :)
> > >
> > >
> > > > >
> > > > The xml looks good to me.
> > >
> > > glad to know, I'm quite new at this :)
[...]
> > > > >op drbd_id:0_promote_0 on feeble-0: Error
> > > >
> > > > Find out why this failed.
> > >
> > > Can't see why and how by just looking at the debug logs...
> > > anyway to increase verbosity in there?
> >
> > Add 'debug 1' to your ha.cf to increase verbosity. As a side note ...
> > when using a serial heartbeat channel in combination with crm, use a
> > baud rate as high as possible e.g. 115200 or higher.
>
> chances are that you use dopd, with heartbeat 2.1.3.
> unfortunately while fixing other shortcomings of dopd in 2.1.2,
> we (LinBit) broke the real failover behaviour.
as a matter of fact, yes, I do as per the drbd user guide.
> indication: in the kernel logs would be something like
> "drbd: refusing to be primary while peer is not outdated".
> easy way to find out: configure "fencing ignore" in drbd,
> and see if heartbeat then behaves as expected.
I haven't seen any kernel messages as the above and it seems to me
that dopd behaves (running Etch and 2.6.22.2-i686-smp). I tested by
bringing down the replication link and if I dd a big file on the
backing device on the primary I can see the device syncing happening
when the link is brought up again and the connection state changes
back to 'Connected'.
I think I have found out my problem though: I didn't put the resource
location stuff for pingd. I added this snippet to the CIB to constrain
the master-slave drbd resource to not run on a node with lost
connectivity and so far in my tests it seems to work:
<rsc_location id="drbd_id:connected" rsc="ms-drbd_id">
<rule id="drbd_id:connected:rule" score="-INFINITY" boolean_op="or">
<expression id="drbd_id:connected:expr:undefined" attribute="pingd"
operation="not_defined"/>
<expression id="drbd_id:connected:expr:zero" attribute="pingd"
operation="lte" value="0"/>
</rule>
</rsc_location>
and when I put the master in stanby the resources are correctly
migrated. Same goes when I poweroff the master or yank the eth0
network cable. I still haver issues about failover as it seems that
'auto_failback off' is not honored correctly.
I'm not sure what to do next. Should I move to pacemaker?
thanks to all
jf
> for more information, workarounds and status of the fix
> please have a look at the thread around
> http://thread.gmane.org/gmane.linux.network.drbd/14345/focus=14372
>
> sory for puting out broken software. blame is on me, I did not double
> check QA, which would have been my job before committing anything.
>
> the fix we have pending undergoes thorough regression testing this time.
>
> --
> : Lars Ellenberg Tel +43-1-8178292-55 :
> : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
> : Vivenotgasse 48, A-1120 Vienna/Europe http://www.linbit.com :
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
--
<° ><
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems