On 07/13/09 14:19, Sergei Kolodka wrote:
>> 2. Node 1 panics
>> To node 2, this still looks like a split brain as it
>> can not contact 
>> node1. If the algo is modified to give priority to
>> node1, you will have 
>> a full cluster outage.
>
> Correct. However if node 2 panics it also loses connection to quorum server. 
> I'm pretty sure that we can reliably say that if node is not accessible via 
> both private and public interfaces it is 99.9% dead. Given choice between 
> 5-15 second delay before failover/crash of node and 15+ minute-long start of 
> tens of Oracle databases I'd choose 5-15 second delay.
In my opinion, it should be something optional (i.e the administrator 
can configure) because there are many people who want it to be 100% 
accurate which is what the algorithm is doing now.

In this particular example, the quorum device may not be a Quorum Server 
(QS), it could be a disk and then things become more trickier. Also if 
it is a QS, and node1 is hung (not panicked) then the QS will still 
report that the node1 is active.




-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://mail.opensolaris.org/pipermail/ha-clusters-discuss/attachments/20090713/f23ccef9/attachment.html>

Reply via email to