[Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Laurentiu Gosu
Hi, I have a 2 nodes ocfs2 cluster running UEK 2.6.32-100.0.19.el5, ocfs2console-1.6.3-2.el5, ocfs2-tools-1.6.3-2.el5. My problem is that all the time when i try to run /etc/init.d/o2cb stop it fails with this error: Stopping O2CB cluster CLUSTER: Failed Unable to stop cluster as

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Sunil Mushran
ls -lR /sys/kernel/config/cluster What does this return? On 10/18/2011 02:05 PM, Laurentiu Gosu wrote: Hi, I have a 2 nodes ocfs2 cluster running UEK 2.6.32-100.0.19.el5, ocfs2console-1.6.3-2.el5, ocfs2-tools-1.6.3-2.el5. My problem is that all the time when i try to run /etc/init.d/o2cb

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Sunil Mushran
What does this return? cat /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev Also, do: ls -lR /sys/kernel/debug/ocfs2 ls -lR /sys/kernel/debug/o2dlm On 10/18/2011 02:14 PM, Laurentiu Gosu wrote: Here is the output: ls -lR /sys/kernel/config/cluster

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Laurentiu Gosu
Again the outputs: cat /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev dm-2 ---here should be volgr1-lvol0 i guess? ls -lR /sys/kernel/debug/ocfs2 ls: /sys/kernel/debug/ocfs2: No such file or directory ls -lR /sys/kernel/debug/o2dlm ls:

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Sunil Mushran
mount -t debugfs debugfs /sys/kernel/debug Then list that dir. Also, do: ocfs2_hb_ctl -l -d /dev/dm-2 Be careful before killing. We want to be sure that dev is not mounted. On 10/18/2011 02:23 PM, Laurentiu Gosu wrote: Again the outputs: cat

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Laurentiu Gosu
ls -lR /sys/kernel/debug/ocfs2 /sys/kernel/debug/ocfs2: total 0 ls -lR /sys/kernel/debug/o2dlm /sys/kernel/debug/o2dlm: total 0 ocfs2_hb_ctl -I -d /dev/dm-2 ocfs2_hb_ctl: Device name specified was not found while reading uuid There is no /dev/dm-2 mounted. On 10/19/2011 00:27, Sunil Mushran

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Sunil Mushran
So it is not mounted. But we still have a hb thread because hb could not be stopped during umount. The reason for that could be the same that causes ocfs2_hb_ctl to fail. Do: mounted.ocfs2 -d On 10/18/2011 02:32 PM, Laurentiu Gosu wrote: ls -lR /sys/kernel/debug/ocfs2 /sys/kernel/debug/ocfs2:

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Laurentiu Gosu
mounted.ocfs2 -d DeviceFS Stack UUID Label /dev/mapper/volgr1-lvol0 ocfs2 o2cb 0C4AB55FE9314FA5A9F81652FDB9B22D ocfs2 mounted.ocfs2 -f DeviceFS Nodes /dev/mapper/volgr1-lvol0 ocfs2 ro02xsrv001 ro02xsrv001 = the other

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Sunil Mushran
ocfs2_hb_ctl -l -u 0C4AB55FE9314FA5A9F81652FDB9B22D On 10/18/2011 02:40 PM, Laurentiu Gosu wrote: mounted.ocfs2 -d DeviceFS Stack UUID Label /dev/mapper/volgr1-lvol0 ocfs2 o2cb 0C4AB55FE9314FA5A9F81652FDB9B22D ocfs2 mounted.ocfs2 -f

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Laurentiu Gosu
ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D 0C4AB55FE9314FA5A9F81652FDB9B22D: 0 refs On 10/19/2011 00:43, Sunil Mushran wrote: ocfs2_hb_ctl -l -u 0C4AB55FE9314FA5A9F81652FDB9B22D On 10/18/2011 02:40 PM, Laurentiu Gosu wrote: mounted.ocfs2 -d DeviceFS Stack UUID

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Laurentiu Gosu
ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat No improvment :( On 10/19/2011 00:50, Sunil Mushran wrote: See if this cleans it up. ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D On 10/18/2011 02:44 PM,

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Sunil Mushran
See if this cleans it up. ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D On 10/18/2011 02:44 PM, Laurentiu Gosu wrote: ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D 0C4AB55FE9314FA5A9F81652FDB9B22D: 0 refs On 10/19/2011 00:43, Sunil Mushran wrote: ocfs2_hb_ctl -l -u

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Sunil Mushran
Let's do it by hand. rm -rf /sys/kernel/config/cluster/.../heartbeat/0C4AB55FE9314FA5A9F81652FDB9B22D On 10/18/2011 02:52 PM, Laurentiu Gosu wrote: ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat No improvment :( On

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Laurentiu Gosu
well..this is weird ls /sys/kernel/config/cluster/CLUSTER/heartbeat/ *918673F06F8F4ED188DDCE14F39945F6* dead_threshold looks like we have different UUIDs. Where is this coming from?? ocfs2_hb_ctl -I -u 918673F06F8F4ED188DDCE14F39945F6 918673F06F8F4ED188DDCE14F39945F6: 1 refs On 10/19/2011

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Sunil Mushran
Did you reformat the volume recently? or, when did you format last? On 10/18/2011 03:13 PM, Laurentiu Gosu wrote: well..this is weird ls /sys/kernel/config/cluster/CLUSTER/heartbeat/ *918673F06F8F4ED188DDCE14F39945F6* dead_threshold looks like we have different UUIDs. Where is this coming

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Laurentiu Gosu
Yes, i did reformat it(even more than once i think, last week). This is a pre-production system and i'm trying various options before moving into real life. On 10/19/2011 01:19, Sunil Mushran wrote: Did you reformat the volume recently? or, when did you format last? On 10/18/2011 03:13 PM,

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Sunil Mushran
One way this can happen is if one starts the hb manually and then force formats on that volume. The format will generate a new uuid. Once that happens, the hb tool cannot map the region to the device and thus fail to stop it. Right now the easiest option on this box is resetting it. On

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Laurentiu Gosu
OK, i rebooted one of the nodes(both had similar issues); . But something is still fishy. - i mounted the device: mount -t ocfs2 /dev/volgr1/lvol0 /mnt/tmp/ - i unmount it: umount /mnt/tmp/ - tried to stop o2cb: /etc/init.d/o2cb stop Stopping O2CB cluster CLUSTER: Failed Unable to stop cluster

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Sunil Mushran
Manual delete will only work if there are no references. In your case there are references. You may want to start both nodes from scratch. Do not start/stop heartbeat manually. Also, do not force-format. On 10/18/2011 03:54 PM, Laurentiu Gosu wrote: OK, i rebooted one of the nodes(both had