On Mon, Feb 25, 2008 at 6:10 PM, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote:
> Hi,
>
>
>  On Mon, Feb 25, 2008 at 03:36:31PM -0500, Doug Lochart wrote:
>  > heartbeat 2.1.3_3 and drbd 8.0.8 (dopd and STONITH ip,i in use)
>  >
>  > I successfully was able to test my 2 node cluster simply by powering
>  > the nodes off and on in varying order and the HA resources
>  > successfully moved in each case (hurray).
>  > Now I went back to my original test of previous frustration.  I yanked
>  > all the ethernet cables from the primary machine (both LAN and
>  > crossover)
>  >
>  > On the Secondary (unaffected) machine I see that STONITH tried to
>  > shoot the other node for about 20 minutes before giving up.  Right now
>  > my secomdary node says Secondary/Unknown and the Primary Node says
>  > Primary/Unknown.
>  >
>  > First off is there a configurable parameter for STONITH on how long it 
> tries?
>
>  No. It should be trying forever. That's what is in the cluster
>  configuration, i.e. protect resources using the stonith, and the
>  cluster shouldn't move until there was a successful reset
>  operation.
>
>
>  > When I plug the network back into the Primary immediately rebooted
>  > (not sure why)
>
>  Either stonith or fastfail. The logs would say.
>
>
>  > and when it came back up I was in split brain again.
>  >
>  > So whenever you have 2 nodes in a cluster and all redundant
>  > communication paths have been suffered by default then you will have a
>  > Split Brain that needs to be manually corrected.  Am I understanding
>  > this right?
>
>  No, it should recover automatically. Please take a look at the
>  logs or post them.

Dejan,  I plan to rerun the tests this morning.  Do I need to have any
specific settings in drbd.conf in order for it to recover
automatically?  If I did not say before I am using version 1 config
files under heartbeat 2.1.3_3.

thanks

Doug


>
>  Thanks,
>
>  Dejan
>
>
>  > I am not complaining I am just trying to determine what I am to expect
>  > so I can write up procedures and what not.  The failover worked great
>  > with other tests.
>  >
>  > regards,
>  >
>  > Doug
>  >
>  >
>  >
>  > --
>  > What profits a man if he gains the whole world yet loses his soul?
>  > _______________________________________________
>  > Linux-HA mailing list
>  > [email protected]
>  > http://lists.linux-ha.org/mailman/listinfo/linux-ha
>  > See also: http://linux-ha.org/ReportingProblems
>  _______________________________________________
>  Linux-HA mailing list
>  [email protected]
>  http://lists.linux-ha.org/mailman/listinfo/linux-ha
>  See also: http://linux-ha.org/ReportingProblems
>



-- 
What profits a man if he gains the whole world yet loses his soul?
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to