On Tue, Feb 26, 2008 at 11:05 AM, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote:
> Hi,
>
>
>
>  On Tue, Feb 26, 2008 at 10:59:26AM -0500, Doug Lochart wrote:
>  > On Mon, Feb 25, 2008 at 6:10 PM, Dejan Muhamedagic <[EMAIL PROTECTED]> 
> wrote:
>  > > Hi,
>  > >
>  > >
>  > >  On Mon, Feb 25, 2008 at 03:36:31PM -0500, Doug Lochart wrote:
>  > >  > heartbeat 2.1.3_3 and drbd 8.0.8 (dopd and STONITH ip,i in use)
>  > >  >
>  > >  > I successfully was able to test my 2 node cluster simply by powering
>  > >  > the nodes off and on in varying order and the HA resources
>  > >  > successfully moved in each case (hurray).
>  > >  > Now I went back to my original test of previous frustration.  I yanked
>  > >  > all the ethernet cables from the primary machine (both LAN and
>  > >  > crossover)
>  > >  >
>  > >  > On the Secondary (unaffected) machine I see that STONITH tried to
>  > >  > shoot the other node for about 20 minutes before giving up.  Right now
>  > >  > my secomdary node says Secondary/Unknown and the Primary Node says
>  > >  > Primary/Unknown.
>  > >  >
>  > >  > First off is there a configurable parameter for STONITH on how long 
> it tries?
>  > >
>  > >  No. It should be trying forever. That's what is in the cluster
>  > >  configuration, i.e. protect resources using the stonith, and the
>  > >  cluster shouldn't move until there was a successful reset
>  > >  operation.
>  > >
>  > >
>  > >  > When I plug the network back into the Primary immediately rebooted
>  > >  > (not sure why)
>  > >
>  > >  Either stonith or fastfail. The logs would say.
>  > >
>  > >
>  > >  > and when it came back up I was in split brain again.
>  > >  >
>  > >  > So whenever you have 2 nodes in a cluster and all redundant
>  > >  > communication paths have been suffered by default then you will have a
>  > >  > Split Brain that needs to be manually corrected.  Am I understanding
>  > >  > this right?
>  > >
>  > >  No, it should recover automatically. Please take a look at the
>  > >  logs or post them.
>  >
>  > Dejan,  I plan to rerun the tests this morning.  Do I need to have any
>  > specific settings in drbd.conf in order for it to recover
>  > automatically?  If I did not say before I am using version 1 config
>  > files under heartbeat 2.1.3_3.
>
>  Hmm, I thought you were referring to heartbeat. If you have a
>  drbd split brain, then I'm not sure if I can help.

Honestly I did not know there was a split-brain distinction between
heartbeat and drbd.  i thought (wrongly I see) if you have a split
brain ... you have a split brain.  So you are saying that heartbeat
can have a split-brain that is different than drbd?  If so then at
least I can target my efforts appropriately.  The fact that
split-brain and drbd are mentioned and linke all over the HA site made
me assume that split-brain was all one in the same.

> If it's
>  something that happened just by switching from v1 to v2 then it
>  must be wrong usage.

No I have not switched or anything

> I suppose that you read the drbd howto?

Yes I will.  They finally put out a nice users guide that should help
tremendously.  Much of my previous frustration with the lack of
documentation stemmed from DRBD and not Heartbeat.  For heartbeat it
is there you just have to find it and when you don't know what you are
looking for (in the case of a novice) the task proves more difficult.

I performed my test again and I received the same Split-Brain.
However I was wrong in my first post when it looked like stonith
stopped (it did not) I think my tail session of the log was terminated
by mistake.  Anyway when I put the cables back in the second node
stonith'd the first and caused a restart.  Unfortunately they came
back up in Split Brain as reported by the drbd kernel module.

I will address these issues then on that list.

Thanks again for sharing your non-split-brain :)

regards,

Doug
>  Thanks,
>
>  Dejan
>
>
>
>  > thanks
>  >
>  > Doug
>  >
>  >
>  > >
>  > >  Thanks,
>  > >
>  > >  Dejan
>  > >
>  > >
>  > >  > I am not complaining I am just trying to determine what I am to expect
>  > >  > so I can write up procedures and what not.  The failover worked great
>  > >  > with other tests.
>  > >  >
>  > >  > regards,
>  > >  >
>  > >  > Doug
>  > >  >
>  > >  >
>  > >  >
>  > >  > --
>  > >  > What profits a man if he gains the whole world yet loses his soul?
>  > >  > _______________________________________________
>  > >  > Linux-HA mailing list
>  > >  > [email protected]
>  > >  > http://lists.linux-ha.org/mailman/listinfo/linux-ha
>  > >  > See also: http://linux-ha.org/ReportingProblems
>  > >  _______________________________________________
>  > >  Linux-HA mailing list
>  > >  [email protected]
>  > >  http://lists.linux-ha.org/mailman/listinfo/linux-ha
>  > >  See also: http://linux-ha.org/ReportingProblems
>  > >
>  >
>  >
>  >
>  > --
>  > What profits a man if he gains the whole world yet loses his soul?
>  > _______________________________________________
>  > Linux-HA mailing list
>  > [email protected]
>  > http://lists.linux-ha.org/mailman/listinfo/linux-ha
>  > See also: http://linux-ha.org/ReportingProblems
>  _______________________________________________
>  Linux-HA mailing list
>  [email protected]
>  http://lists.linux-ha.org/mailman/listinfo/linux-ha
>  See also: http://linux-ha.org/ReportingProblems
>



-- 
What profits a man if he gains the whole world yet loses his soul?
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to