Pulling the cables between shared storage and foo01, foo01 gets fenced. Here is some info from foo02 about shared storage and dlm debug (lock file seems to remain locked)
root@foo02:-//data/activemq_data#ls -li total 276 66467 -rw-r--r-- 1 root root 33030144 Dec 30 16:32 db-1.log 66468 -rw-r--r-- 1 root root 73728 Dec 30 16:24 db.data 66470 -rw-r--r-- 1 root root 53344 Dec 30 16:24 db.redo 128014 -rw-r--r-- 1 root root 0 Dec 30 19:49 dummy 66466 -rw-r--r-- 1 root root 0 Dec 30 16:23 lock root@foo02:-//data/activemq_data#grep -A 7 -i 103a2 /debug/dlm/activemq Resource ffff81090faf96c0 Name (len=24) " 2 103a2" Master Copy Granted Queue 03d10002 PR Remote: 1 00c80001 00e00001 PR Conversion Queue Waiting Queue -- Resource ffff81090faf97c0 Name (len=24) " 5 103a2" Master Copy Granted Queue 03c30003 PR Remote: 1 039a0001 03550001 PR Conversion Queue Waiting Queue Are there some docs for interpreting this dlm debug output? Regards, Stevo. On Fri, Dec 30, 2011 at 9:23 PM, Digimer <li...@alteeve.com> wrote: > On 12/30/2011 03:08 PM, Stevo Slavić wrote: > > Hi Digimer and Yvette, > > > > Thanks for tips! I don't doubt reliability of the technology, just want > > to make sure it is configured well. > > > > After fencing a node that held a lock on a file on shared storage, lock > > remains, and non-fenced node cannot take over the lock on that file. > > Wondering how can one check which process (from which node if possible) > > is holding a lock on a file on shared storage. > > dlm should have taken care of releasing the lock once node got fenced, > > right? > > > > Regards, > > Stevo. > > After a successful fence call, DLM will clean up any locks held by the > lost node. That's why it's so critical that the fence action succeeded > (ie: test-test-test). If a node doesn't actually die in a fence, but the > cluster thinks it did, and somehow the lost node returns, the lost node > will think it's locks are still valid and modify shared storage, leading > to near-certain data corruption. > > It's all perfectly safe, provided you've tested your fencing properly. :) > > Yvette, > > You might be right on the 'noatime' implying 'nodiratime'... I add > both out of habit. > > -- > Digimer > E-Mail: digi...@alteeve.com > Freenode handle: digimer > Papers and Projects: http://alteeve.com > Node Assassin: http://nodeassassin.org > "omg my singularity battery is dead again. > stupid hawking radiation." - epitron >
-- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster