Junko IKEDA wrote:
>> OK. I think you are mis-understanding the problem.
>>
>> When the communication between Node A & B is fine, you don't need any
>> kind of lock. Heartbeat itself can ensure the resource runs on one
> selected
>> node, and on one node only.
> 
> sfex_lock() is just checking the status that shows which node succeeded to
> lock.
> It won't be always trying to lock over and over again
> 
>> sfex_lock is valuable when the communication between A & B is broken.
>> But when the communication IS broken, you can't assume sfex_lock will run
>> in order any more.
> 
> If the interconnect LAN is down, Split-Brain will come.
> the lock status is reserved for Node A at this moment,
> but Node B is also trying to update the status in order to lock because
> Split-Brain has arisen.
> while Node A checks the status, Node B might update it.
> Node A, which is overwrote its status, is going to release the lock.
> sfex_lock() doesn't have such a complex logic.

I believe that the point he was trying to make is that it _needs_ the
complexity of the logic to be always correct even in the split-brain
case - and I agree.

If this logic fails and both sides think they have exclusive access in a
split-brain case, then a filesystem on disk may be destroyed.  This is a
_very_ bad consequence - much worse than a crash.  It doesn't matter if
it is relatively unlikely, because the consequence is so terrible.  With
hundreds of thousands of clusters running Heartbeat, even unlikely
events eventually happen.
        http://linux-ha.org/BadThingsWillHappen

You should be able to run hundreds of thousands or millions of tests
where both sides are trying to get the lock at the same time, and be
able to verify that only one side got the lock - in every single case.

Please don't be discouraged.  Horms started a similar effort a few years
ago, but he wasn't able to spend enough time with it to get it right.

What you're doing is a valuable thing to do, and we all understand very
well that it's difficult.

When I first entered this discussion, I mentioned lockless
synchronization algorithms as being good things to study.  In this case,
we are trying to create a lock, but I suspect the lockless methods would
be a good way to synchronize the creation of a lock (even though this
sounds odd).

-- 
    Alan Robertson <[EMAIL PROTECTED]>

"Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to