Pulling the cables between shared storage and foo01, foo01 gets fenced.
Here is some info from foo02 about shared storage and dlm debug (lock file
seems to remain locked)

root@foo02:-//data/activemq_data#ls -li
total 276
 66467 -rw-r--r-- 1 root root 33030144 Dec 30 16:32 db-1.log
 66468 -rw-r--r-- 1 root root    73728 Dec 30 16:24 db.data
 66470 -rw-r--r-- 1 root root    53344 Dec 30 16:24 db.redo
128014 -rw-r--r-- 1 root root        0 Dec 30 19:49 dummy
 66466 -rw-r--r-- 1 root root        0 Dec 30 16:23 lock
root@foo02:-//data/activemq_data#grep -A 7 -i 103a2 /debug/dlm/activemq
Resource ffff81090faf96c0 Name (len=24) "       2           103a2"
Master Copy
Granted Queue
03d10002 PR Remote:   1 00c80001
00e00001 PR
Conversion Queue
Waiting Queue
--
Resource ffff81090faf97c0 Name (len=24) "       5           103a2"
Master Copy
Granted Queue
03c30003 PR Remote:   1 039a0001
03550001 PR
Conversion Queue
Waiting Queue


Are there some docs for interpreting this dlm debug output?


Regards,
Stevo.

On Fri, Dec 30, 2011 at 9:23 PM, Digimer <li...@alteeve.com> wrote:

> On 12/30/2011 03:08 PM, Stevo Slavić wrote:
> > Hi Digimer and Yvette,
> >
> > Thanks for tips! I don't doubt reliability of the technology, just want
> > to make sure it is configured well.
> >
> > After fencing a node that held a lock on a file on shared storage, lock
> > remains, and non-fenced node cannot take over the lock on that file.
> > Wondering how can one check which process (from which node if possible)
> > is holding a lock on a file on shared storage.
> > dlm should have taken care of releasing the lock once node got fenced,
> > right?
> >
> > Regards,
> > Stevo.
>
> After a successful fence call, DLM will clean up any locks held by the
> lost node. That's why it's so critical that the fence action succeeded
> (ie: test-test-test). If a node doesn't actually die in a fence, but the
> cluster thinks it did, and somehow the lost node returns, the lost node
> will think it's locks are still valid and modify shared storage, leading
> to near-certain data corruption.
>
> It's all perfectly safe, provided you've tested your fencing properly. :)
>
> Yvette,
>
>  You might be right on the 'noatime' implying 'nodiratime'... I add
> both out of habit.
>
> --
> Digimer
> E-Mail:              digi...@alteeve.com
> Freenode handle:     digimer
> Papers and Projects: http://alteeve.com
> Node Assassin:       http://nodeassassin.org
> "omg my singularity battery is dead again.
> stupid hawking radiation." - epitron
>
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Reply via email to