Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2012-12-28 Thread quanta
To kill this heartbeat region without rebooting, you can change the UUID 
of your OCFS2 volume:


   tunefs.ocfs2 --uuid-reset=0C4AB55FE9314FA5A9F81652FDB9B22D
   /dev/mapper/volgr1-lvol0

and try again:

   ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D

On 10/19/2011 05:33 AM, Sunil Mushran wrote:

One way this can happen is if one starts the hb manually and then force
formats on that volume. The format will generate a new uuid. Once that
happens, the hb tool cannot map the region to the device and thus fail
to stop it. Right now the easiest option on this box is resetting it.

On 10/18/2011 03:24 PM, Laurentiu Gosu wrote:

Yes, i did reformat it(even more than once i think, last week). This is a 
pre-production system and i'm trying various options before moving into real 
life.


On 10/19/2011 01:19, Sunil Mushran wrote:

Did you reformat the volume recently? or, when did you format last?

On 10/18/2011 03:13 PM, Laurentiu Gosu wrote:

well..this is weird
ls /sys/kernel/config/cluster/CLUSTER/heartbeat/
*918673F06F8F4ED188DDCE14F39945F6*  dead_threshold

looks like we have different UUIDs. Where is this coming from??

ocfs2_hb_ctl -I -u 918673F06F8F4ED188DDCE14F39945F6
918673F06F8F4ED188DDCE14F39945F6: 1 refs


On 10/19/2011 01:04, Sunil Mushran wrote:

Let's do it by hand.
rm -rf 
/sys/kernel/config/cluster/.../heartbeat/*0C4AB55FE9314FA5A9F81652FDB9B22D *

On 10/18/2011 02:52 PM, Laurentiu Gosu wrote:

  ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat

No improvment :(


On 10/19/2011 00:50, Sunil Mushran wrote:

See if this cleans it up.
ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D

On 10/18/2011 02:44 PM, Laurentiu Gosu wrote:

ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
0C4AB55FE9314FA5A9F81652FDB9B22D: 0 refs


On 10/19/2011 00:43, Sunil Mushran wrote:

ocfs2_hb_ctl -l -u 0C4AB55FE9314FA5A9F81652FDB9B22D

On 10/18/2011 02:40 PM, Laurentiu Gosu wrote:

mounted.ocfs2 -d
DeviceFS Stack  UUID  Label
/dev/mapper/volgr1-lvol0  ocfs2  o2cb   0C4AB55FE9314FA5A9F81652FDB9B22D  ocfs2

mounted.ocfs2 -f
DeviceFS Nodes
/dev/mapper/volgr1-lvol0  ocfs2  ro02xsrv001

ro02xsrv001 = the other node in the cluster.

By the way, there is no /dev/md-2
  ls /dev/dm-*
/dev/dm-0  /dev/dm-1


On 10/19/2011 00:37, Sunil Mushran wrote:

So it is not mounted. But we still have a hb thread because
hb could not be stopped during umount. The reason for that
could be the same that causes ocfs2_hb_ctl to fail.

Do:
mounted.ocfs2 -d

On 10/18/2011 02:32 PM, Laurentiu Gosu wrote:

ls -lR /sys/kernel/debug/ocfs2
/sys/kernel/debug/ocfs2:
total 0

ls -lR /sys/kernel/debug/o2dlm
/sys/kernel/debug/o2dlm:
total 0

ocfs2_hb_ctl -I -d /dev/dm-2
ocfs2_hb_ctl: Device name specified was not found while reading uuid

There is no /dev/dm-2 mounted.


On 10/19/2011 00:27, Sunil Mushran wrote:

mount -t debugfs debugfs /sys/kernel/debug

Then list that dir.

Also, do:
ocfs2_hb_ctl -l -d /dev/dm-2

Be careful before killing. We want to be sure that dev is not mounted.

On 10/18/2011 02:23 PM, Laurentiu Gosu wrote:

Again   the outputs:
  cat 
/sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
dm-2
--->here should be volgr1-lvol0 i guess?

ls -lR /sys/kernel/debug/ocfs2
ls: /sys/kernel/debug/ocfs2: No such file or directory

ls -lR /sys/kernel/debug/o2dlm
ls: /sys/kernel/debug/o2dlm: No such file or directory

I think i have to enable debug first somehow..?

Laurentiu.

On 10/19/2011 00:17, Sunil Mushran wrote:

What does this return?
cat 
/sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev

Also, do:
ls -lR /sys/kernel/debug/ocfs2
ls -lR /sys/kernel/debug/o2dlm

On 10/18/2011 02:14 PM, Laurentiu Gosu wrote:

Here is the output:

ls -lR /sys/kernel/config/cluster
/sys/kernel/config/cluster:
total 0
drwxr-xr-x 4 root root 0 Oct 19 00:12 CLUSTER

/sys/kernel/config/cluster/CLUSTER:
total 0
-rw-r--r-- 1 root root 4096 Oct 19 00:12 fence_method
drwxr-xr-x 3 root root0 Oct 19 00:12 heartbeat
-rw-r--r-- 1 root root 4096 Oct 19 00:12 idle_timeout_ms
-rw-r--r-- 1 root root 4096 Oct 19 00:12 keepalive_delay_ms
drwxr-xr-x 4 root root0 Oct 11 20:23 node
-rw-r--r-- 1 root root 4096 Oct 19 00:12 reconnect_delay_ms

/sys/kernel/config/cluster/CLUSTER/heartbeat:
total 0
drwxr-xr-x 2 root root0 Oct 19 00:12 918673F06F8F4ED188DDCE14F39945F6
-rw-r--r-- 1 root root 4096 Oct 19 00:12 dead_threshold

/sys/kernel/config/cluster/CLUSTER/heartbeat/*918673F06F8F4ED188DDCE14F39945F6*:
total 0
-rw-r--r-- 1 root root 4096 Oct 19 00:12 block_bytes
-rw-r--r-- 1 root root 4096 Oct 19 00:12 blocks
-rw-r--r-- 1 root root 4096 Oct 19 00:12 dev
-r--r--r-- 1 root root 4096 Oct 19 00:12 pid
-rw-r--r-- 1 root root 4096 Oct 19 00:12 start_block

/sys/kernel/config/cluster/CLUSTER/node:
total 0
drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv001
drwxr-xr-x 2 

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2012-12-26 Thread quanta

OK. Problem solved by running:

|# tunefs.ocfs2 --uuid-reset=72EF09EA3D0D4F51BDC00B47432B1EB2 /dev/drbd1
WARNING!!! OCFS2 uses the UUID to uniquely identify a file system.
Having two OCFS2 file systems with the same UUID could, in the least,
cause erratic behavior, and if unlucky, cause file system damage.
Please choose the UUID with care.
Update the UUID ?yes|

More details here: http://serverfault.com/a/460944/59925

On 12/25/2012 12:35 AM, quanta wrote:

I accidentally re-formated the volume.
Is there any way to get rid of this problem without rebooting:

# mounted.ocfs2 -d
DeviceFS Stack  UUID  Label
/dev/sdb  ocfs2  o2cb   12963EAF4E16484DB81ECB0251177C26  
ocfs2_drbd1
/dev/drbd1ocfs2  o2cb   12963EAF4E16484DB81ECB0251177C26  
ocfs2_drbd1

# ls -l /sys/kernel/config/cluster/cpc/heartbeat/
drwxr-xr-x 2 root root0 Dec 24 22:53 72EF09EA3D0D4F51BDC00B47432B1EB2

# ocfs2_hb_ctl -I -u 72EF09EA3D0D4F51BDC00B47432B1EB2
72EF09EA3D0D4F51BDC00B47432B1EB2: 7 refs

# ocfs2_hb_ctl -K -u 72EF09EA3D0D4F51BDC00B47432B1EB2
ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat


On 10/19/2011 01:33, Sunil Mushran wrote:
>/  One way this can happen is if one starts the hb manually and then force
/>/  formats on that volume. The format will generate a new uuid. Once that
/>/  happens, the hb tool cannot map the region to the device and thus fail
/>/  to stop it. Right now the easiest option on this box is resetting it.
/>/
/>/  On 10/18/2011 03:24 PM, Laurentiu Gosu wrote:
/>>/  Yes, i did reformat it(even more than once i think, last week). This
/>>/  is a pre-production system and i'm trying various options before
/>>/  moving into real life.
/>>/
/>>/
/>>/  On 10/19/2011 01:19, Sunil Mushran wrote:
/>>>/  Did you reformat the volume recently? or, when did you format last?
/>>>/
/>>>/  On 10/18/2011 03:13 PM, Laurentiu Gosu wrote:
//  well..this is weird
//  ls /sys/kernel/config/cluster/CLUSTER/heartbeat/
//  *918673F06F8F4ED188DDCE14F39945F6*  dead_threshold
//
//  looks like we have different UUIDs. Where is this coming from??
//
//  ocfs2_hb_ctl -I -u 918673F06F8F4ED188DDCE14F39945F6
//  918673F06F8F4ED188DDCE14F39945F6: 1 refs
//
//
//  On 10/19/2011 01:04, Sunil Mushran wrote:
/>/  Let's do it by hand.
/>/  rm -rf
/>/  
/sys/kernel/config/cluster/.../heartbeat/*0C4AB55FE9314FA5A9F81652FDB9B22D
/>/  *
/>/
/>/  On 10/18/2011 02:52 PM, Laurentiu Gosu wrote:
/>>/   ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
/>>/  ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping
/>>/  heartbeat
/>>/
/>>/  No improvment :(
/>>/
/>>/
/>>/  On 10/19/2011 00:50, Sunil Mushran wrote:
/>>>/  See if this cleans it up.
/>>>/  ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
/>>>/
/>>>/  On 10/18/2011 02:44 PM, Laurentiu Gosu wrote:
//  ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
//  0C4AB55FE9314FA5A9F81652FDB9B22D: 0 refs
//
//
//  On 10/19/2011 00:43, Sunil Mushran wrote:
/>/  ocfs2_hb_ctl -l -u 0C4AB55FE9314FA5A9F81652FDB9B22D
/>/
/>/  On 10/18/2011 02:40 PM, Laurentiu Gosu wrote:
/>>/  mounted.ocfs2 -d
/>>/  DeviceFS Stack
/>>/  UUID  Label
/>>/  /dev/mapper/volgr1-lvol0  ocfs2  o2cb
/>>/  0C4AB55FE9314FA5A9F81652FDB9B22D  ocfs2
/>>/
/>>/  mounted.ocfs2 -f
/>>/  DeviceFS Nodes
/>>/  /dev/mapper/volgr1-lvol0  ocfs2  ro02xsrv001
/>>/
/>>/  ro02xsrv001 = the other node in the cluster.
/>>/
/>>/  By the way, there is no /dev/md-2
/>>/   ls /dev/dm-*
/>>/  /dev/dm-0  /dev/dm-1
/>>/
/>>/
/>>/  On 10/19/2011 00:37, Sunil Mushran wrote:
/>>>/  So it is not mounted. But we still have a hb thread because
/>>>/  hb could not be stopped during umount. The reason for that
/>>>/  could be the same that causes ocfs2_hb_ctl to fail.
/>>>/
/>>>/  Do:
/>>>/  mounted.ocfs2 -d
/>>>/
/>>>/  On 10/18/2011 02:32 PM, Laurentiu Gosu wrote:
//  ls -lR /sys/kernel/debug/ocfs2
//  /sys/kernel/debug/ocfs2:
//  total 0
//
//  ls -lR /sys/kernel/debug/o2dlm
//  /sys/kernel/debug/o2dlm:
//  total 0
//
//  ocfs2_hb_ctl -I -d /dev/dm-2
//  ocfs2_hb_ctl: Device name specified was not found while
//  reading uuid
//
//  There is no /dev/dm-2 mounted.
//
//
//  On 10/19/2011 00:27, Sunil Mushran wrote:
/>/  mount -t debugfs debugfs /sys/kernel/debug
/>/
/>/  Then list that dir.
/

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2012-12-24 Thread quanta

I accidentally re-formated the volume.
Is there any way to get rid of this problem without rebooting:

# mounted.ocfs2 -d
DeviceFS Stack  UUID  Label
/dev/sdb  ocfs2  o2cb   12963EAF4E16484DB81ECB0251177C26  
ocfs2_drbd1
/dev/drbd1ocfs2  o2cb   12963EAF4E16484DB81ECB0251177C26  
ocfs2_drbd1

# ls -l /sys/kernel/config/cluster/cpc/heartbeat/
drwxr-xr-x 2 root root0 Dec 24 22:53 72EF09EA3D0D4F51BDC00B47432B1EB2

# ocfs2_hb_ctl -I -u 72EF09EA3D0D4F51BDC00B47432B1EB2
72EF09EA3D0D4F51BDC00B47432B1EB2: 7 refs

# ocfs2_hb_ctl -K -u 72EF09EA3D0D4F51BDC00B47432B1EB2
ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat


On 10/19/2011 01:33, Sunil Mushran wrote:

/  One way this can happen is if one starts the hb manually and then force

/>/  formats on that volume. The format will generate a new uuid. Once that
/>/  happens, the hb tool cannot map the region to the device and thus fail
/>/  to stop it. Right now the easiest option on this box is resetting it.
/>/
/>/  On 10/18/2011 03:24 PM, Laurentiu Gosu wrote:
/>>/  Yes, i did reformat it(even more than once i think, last week). This
/>>/  is a pre-production system and i'm trying various options before
/>>/  moving into real life.
/>>/
/>>/
/>>/  On 10/19/2011 01:19, Sunil Mushran wrote:
/>>>/  Did you reformat the volume recently? or, when did you format last?
/>>>/
/>>>/  On 10/18/2011 03:13 PM, Laurentiu Gosu wrote:
//  well..this is weird
//  ls /sys/kernel/config/cluster/CLUSTER/heartbeat/
//  *918673F06F8F4ED188DDCE14F39945F6*  dead_threshold
//
//  looks like we have different UUIDs. Where is this coming from??
//
//  ocfs2_hb_ctl -I -u 918673F06F8F4ED188DDCE14F39945F6
//  918673F06F8F4ED188DDCE14F39945F6: 1 refs
//
//
//  On 10/19/2011 01:04, Sunil Mushran wrote:
/>/  Let's do it by hand.
/>/  rm -rf
/>/  
/sys/kernel/config/cluster/.../heartbeat/*0C4AB55FE9314FA5A9F81652FDB9B22D
/>/  *
/>/
/>/  On 10/18/2011 02:52 PM, Laurentiu Gosu wrote:
/>>/   ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
/>>/  ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping
/>>/  heartbeat
/>>/
/>>/  No improvment :(
/>>/
/>>/
/>>/  On 10/19/2011 00:50, Sunil Mushran wrote:
/>>>/  See if this cleans it up.
/>>>/  ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
/>>>/
/>>>/  On 10/18/2011 02:44 PM, Laurentiu Gosu wrote:
//  ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
//  0C4AB55FE9314FA5A9F81652FDB9B22D: 0 refs
//
//
//  On 10/19/2011 00:43, Sunil Mushran wrote:
/>/  ocfs2_hb_ctl -l -u 0C4AB55FE9314FA5A9F81652FDB9B22D
/>/
/>/  On 10/18/2011 02:40 PM, Laurentiu Gosu wrote:
/>>/  mounted.ocfs2 -d
/>>/  DeviceFS Stack
/>>/  UUID  Label
/>>/  /dev/mapper/volgr1-lvol0  ocfs2  o2cb
/>>/  0C4AB55FE9314FA5A9F81652FDB9B22D  ocfs2
/>>/
/>>/  mounted.ocfs2 -f
/>>/  DeviceFS Nodes
/>>/  /dev/mapper/volgr1-lvol0  ocfs2  ro02xsrv001
/>>/
/>>/  ro02xsrv001 = the other node in the cluster.
/>>/
/>>/  By the way, there is no /dev/md-2
/>>/   ls /dev/dm-*
/>>/  /dev/dm-0  /dev/dm-1
/>>/
/>>/
/>>/  On 10/19/2011 00:37, Sunil Mushran wrote:
/>>>/  So it is not mounted. But we still have a hb thread because
/>>>/  hb could not be stopped during umount. The reason for that
/>>>/  could be the same that causes ocfs2_hb_ctl to fail.
/>>>/
/>>>/  Do:
/>>>/  mounted.ocfs2 -d
/>>>/
/>>>/  On 10/18/2011 02:32 PM, Laurentiu Gosu wrote:
//  ls -lR /sys/kernel/debug/ocfs2
//  /sys/kernel/debug/ocfs2:
//  total 0
//
//  ls -lR /sys/kernel/debug/o2dlm
//  /sys/kernel/debug/o2dlm:
//  total 0
//
//  ocfs2_hb_ctl -I -d /dev/dm-2
//  ocfs2_hb_ctl: Device name specified was not found while
//  reading uuid
//
//  There is no /dev/dm-2 mounted.
//
//
//  On 10/19/2011 00:27, Sunil Mushran wrote:
/>/  mount -t debugfs debugfs /sys/kernel/debug
/>/
/>/  Then list that dir.
/>/
/>/  Also, do:
/>/  ocfs2_hb_ctl -l -d /dev/dm-2
/>/
/>/  Be careful before killing. We want to be sure that dev is
/>/  not mounted.
/>/
/>/  On 10/18/2011 02:23 PM, Laurentiu Gosu wrote:
/>>/  Again   the outputs:
/>>/   cat
/>>/  
/sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
/>

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-12-11 Thread Laurentiu Gosu


Hi Sunil,
Maybe you remember the bellow thread. Shortly the pb was that heartbeat 
region was still active after umounting the ocfs volume(i use latest UEK 
+ ocfs2-tools).
Based on this link 
http://markmail.org/message/7h7r32avuitqdhzr#query:+page:1+mid:lq7arecz2dui6b3v+state:results 
i manually created /dev/dm-2 symlink to point to my SAN device 
[/dev/mapper/volgr1-lvol0] and the hearbeat was stopped normally.  Maybe 
it helps you find the real issue. As i understand that symlink should be 
automatically created but it seems the pb is still there in 
ocfs2-tools-1.6.3-2.el5.


br,
laurentiu.

On 10/24/2011 23:54, Sunil Mushran wrote:

Well, I wouldn't advice you to go into prod with this problem.
To figure out the issue, we'll need to provide a debug version of
ocfs2_hb_ctl.

If you have support, ping oracle support and ask for assistance.

If not, download the source and run ocfs2_hb_ctl in gdb. The problem
is in the code path that begins in the function lookup_dev().

On 10/23/2011 01:30 PM, Laurentiu Gosu wrote:

#rpm -qa |grep ocfs2
ocfs2console-1.6.3-2.el5
ocfs2-tools-1.6.3-2.el5

Just let me know if I can give more details to find the problem. I 
will move ocfs2 into production in the next weeks.



On 10/23/2011 22:49, Sunil Mushran wrote:

Are you sure you have ocfs2-tools-1.6.3? I remember we had an
issue with this with an earlier release... 1.6.1/.2.

On 10/23/2011 10:43 AM, Laurentiu Gosu wrote:

hmm..
#ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
0C4AB55FE9314FA5A9F81652FDB9B22D: 1 refs
*BUT:*
#ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D ocfs2
ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat
I can still kill the ref using device name (-d).

On 10/23/2011 17:57, Sunil Mushran wrote:

I think it stops by uuid. So try doing this the next time.
You are encountering some issue that we have not seen before.
ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D ocfs2

On 10/23/2011 05:32 AM, Laurentiu Gosu wrote:

Hi Sunil,
Sorry for my late reply, i just had time today to start from 
scratch and test.
I rebuilt my environment(2 nodes connected to a SAN via 
iSCSI+multipath). I still have the issue that the heartbeat is 
active after I umount my ocfs2 volume.

/etc/init.d/o2cb stop
Stopping O2CB cluster CLUST: Failed
Unable to stop cluster as heartbeat region still active

ocfs2_hb_ctl -I -d /dev/mapper/volgr1-lvol0
0C4AB55FE9314FA5A9F81652FDB9B22D: 1 refs

After i manually kill the ref (ocfs2_hb_ctl -K -d 
/dev/mapper/volgr1-lvol0 ocfs2 ) i can stop successfully o2cb. I 
can live with that but why doesn't it stop automatically? As i 
understand, hearbeat should be started and stopped once the 
volume gets mounted/umounted.


br,
Laurentiu.

On 10/19/2011 02:28, Sunil Mushran wrote:
Manual delete will only work if there are no references. In your 
case

there are references.

You may want to start both nodes from scratch. Do not start/stop
heartbeat manually. Also, do not force-format.

On 10/18/2011 03:54 PM, Laurentiu Gosu wrote:
OK, i rebooted one of the nodes(both had similar issues); . But 
something is still fishy.

- i mounted the device: mount -t ocfs2 /dev/volgr1/lvol0 /mnt/tmp/
- i unmount it: umount /mnt/tmp/
- tried to stop o2cb:  /etc/init.d/o2cb stop
Stopping O2CB cluster CLUSTER: Failed
Unable to stop cluster as heartbeat region still active
- ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
0C4AB55FE9314FA5A9F81652FDB9B22D: 1 refs
-  ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping 
heartbeat

- ls -Rl /sys/kernel/config/cluster/CLUSTER/heartbeat/
/sys/kernel/config/cluster/CLUSTER/heartbeat/:
total 0
drwxr-xr-x 2 root root0 Oct 19 01:50 
0C4AB55FE9314FA5A9F81652FDB9B22D

-rw-r--r-- 1 root root 4096 Oct 19 01:40 dead_threshold

/sys/kernel/config/cluster/CLUSTER/heartbeat/0C4AB55FE9314FA5A9F81652FDB9B22D:
total 0
-rw-r--r-- 1 root root 4096 Oct 19 01:50 block_bytes
-rw-r--r-- 1 root root 4096 Oct 19 01:50 blocks
-rw-r--r-- 1 root root 4096 Oct 19 01:50 dev
-r--r--r-- 1 root root 4096 Oct 19 01:50 pid
-rw-r--r-- 1 root root 4096 Oct 19 01:50 start_block

- i cannot manually delete 
/sys/kernel/config/cluster/CLUSTER/heartbeat/0C4AB55FE9314FA5A9F81652FDB9B22D/


PS: i'm going to sleep now, i have to be up in a few hours. We 
can continue tomorrow if it's ok with you.

Thank you for your help.

Laurentiu.

On 10/19/2011 01:33, Sunil Mushran wrote:
One way this can happen is if one starts the hb manually and 
then force
formats on that volume. The format will generate a new uuid. 
Once that
happens, the hb tool cannot map the region to the device and 
thus fail
to stop it. Right now the easiest option on this box is 
resetting it.


On 10/18/2011 03:24 PM, Laurentiu Gosu wrote:
Yes, i did reformat it(even more than once i think, last 
week). This is a pre-production system and i'm trying various 
options before moving into real life.



On 10/19/2011 01:19, Sunil Mushran wrote:
Did

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-23 Thread Laurentiu Gosu

#rpm -qa |grep ocfs2
ocfs2console-1.6.3-2.el5
ocfs2-tools-1.6.3-2.el5

Just let me know if I can give more details to find the problem. I will 
move ocfs2 into production in the next weeks.



On 10/23/2011 22:49, Sunil Mushran wrote:

Are you sure you have ocfs2-tools-1.6.3? I remember we had an
issue with this with an earlier release... 1.6.1/.2.

On 10/23/2011 10:43 AM, Laurentiu Gosu wrote:

hmm..
#ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
0C4AB55FE9314FA5A9F81652FDB9B22D: 1 refs
*BUT:*
#ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D ocfs2
ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat
I can still kill the ref using device name (-d).

On 10/23/2011 17:57, Sunil Mushran wrote:

I think it stops by uuid. So try doing this the next time.
You are encountering some issue that we have not seen before.
ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D ocfs2

On 10/23/2011 05:32 AM, Laurentiu Gosu wrote:

Hi Sunil,
Sorry for my late reply, i just had time today to start from 
scratch and test.
I rebuilt my environment(2 nodes connected to a SAN via 
iSCSI+multipath). I still have the issue that the heartbeat is 
active after I umount my ocfs2 volume.

/etc/init.d/o2cb stop
Stopping O2CB cluster CLUST: Failed
Unable to stop cluster as heartbeat region still active

ocfs2_hb_ctl -I -d /dev/mapper/volgr1-lvol0
0C4AB55FE9314FA5A9F81652FDB9B22D: 1 refs

After i manually kill the ref (ocfs2_hb_ctl -K -d 
/dev/mapper/volgr1-lvol0 ocfs2 ) i can stop successfully o2cb. I 
can live with that but why doesn't it stop automatically? As i 
understand, hearbeat should be started and stopped once the volume 
gets mounted/umounted.


br,
Laurentiu.

On 10/19/2011 02:28, Sunil Mushran wrote:

Manual delete will only work if there are no references. In your case
there are references.

You may want to start both nodes from scratch. Do not start/stop
heartbeat manually. Also, do not force-format.

On 10/18/2011 03:54 PM, Laurentiu Gosu wrote:
OK, i rebooted one of the nodes(both had similar issues); . But 
something is still fishy.

- i mounted the device: mount -t ocfs2 /dev/volgr1/lvol0 /mnt/tmp/
- i unmount it: umount /mnt/tmp/
- tried to stop o2cb:  /etc/init.d/o2cb stop
Stopping O2CB cluster CLUSTER: Failed
Unable to stop cluster as heartbeat region still active
- ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
0C4AB55FE9314FA5A9F81652FDB9B22D: 1 refs
-  ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat
- ls -Rl /sys/kernel/config/cluster/CLUSTER/heartbeat/
/sys/kernel/config/cluster/CLUSTER/heartbeat/:
total 0
drwxr-xr-x 2 root root0 Oct 19 01:50 
0C4AB55FE9314FA5A9F81652FDB9B22D

-rw-r--r-- 1 root root 4096 Oct 19 01:40 dead_threshold

/sys/kernel/config/cluster/CLUSTER/heartbeat/0C4AB55FE9314FA5A9F81652FDB9B22D:
total 0
-rw-r--r-- 1 root root 4096 Oct 19 01:50 block_bytes
-rw-r--r-- 1 root root 4096 Oct 19 01:50 blocks
-rw-r--r-- 1 root root 4096 Oct 19 01:50 dev
-r--r--r-- 1 root root 4096 Oct 19 01:50 pid
-rw-r--r-- 1 root root 4096 Oct 19 01:50 start_block

- i cannot manually delete 
/sys/kernel/config/cluster/CLUSTER/heartbeat/0C4AB55FE9314FA5A9F81652FDB9B22D/


PS: i'm going to sleep now, i have to be up in a few hours. We 
can continue tomorrow if it's ok with you.

Thank you for your help.

Laurentiu.

On 10/19/2011 01:33, Sunil Mushran wrote:
One way this can happen is if one starts the hb manually and 
then force
formats on that volume. The format will generate a new uuid. 
Once that
happens, the hb tool cannot map the region to the device and 
thus fail
to stop it. Right now the easiest option on this box is 
resetting it.


On 10/18/2011 03:24 PM, Laurentiu Gosu wrote:
Yes, i did reformat it(even more than once i think, last week). 
This is a pre-production system and i'm trying various options 
before moving into real life.



On 10/19/2011 01:19, Sunil Mushran wrote:
Did you reformat the volume recently? or, when did you format 
last?


On 10/18/2011 03:13 PM, Laurentiu Gosu wrote:

well..this is weird
ls /sys/kernel/config/cluster/CLUSTER/heartbeat/
*918673F06F8F4ED188DDCE14F39945F6*  dead_threshold

looks like we have different UUIDs. Where is this coming from??

ocfs2_hb_ctl -I -u 918673F06F8F4ED188DDCE14F39945F6
918673F06F8F4ED188DDCE14F39945F6: 1 refs


On 10/19/2011 01:04, Sunil Mushran wrote:

Let's do it by hand.
rm -rf 
/sys/kernel/config/cluster/.../heartbeat/*0C4AB55FE9314FA5A9F81652FDB9B22D 
*


On 10/18/2011 02:52 PM, Laurentiu Gosu wrote:

 ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping 
heartbeat


No improvment :(


On 10/19/2011 00:50, Sunil Mushran wrote:

See if this cleans it up.
ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D

On 10/18/2011 02:44 PM, Laurentiu Gosu wrote:

ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
0C4AB55FE9314FA5A9F81652FDB9B22D: 0 refs


On 10/19/2011 00:43, Sunil Mushran w

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-23 Thread Sunil Mushran

Are you sure you have ocfs2-tools-1.6.3? I remember we had an
issue with this with an earlier release... 1.6.1/.2.

On 10/23/2011 10:43 AM, Laurentiu Gosu wrote:

hmm..
#ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
0C4AB55FE9314FA5A9F81652FDB9B22D: 1 refs
*BUT:*
#ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D ocfs2
ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat
I can still kill the ref using device name (-d).

On 10/23/2011 17:57, Sunil Mushran wrote:

I think it stops by uuid. So try doing this the next time.
You are encountering some issue that we have not seen before.
ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D ocfs2

On 10/23/2011 05:32 AM, Laurentiu Gosu wrote:

Hi Sunil,
Sorry for my late reply, i just had time today to start from scratch 
and test.
I rebuilt my environment(2 nodes connected to a SAN via 
iSCSI+multipath). I still have the issue that the heartbeat is 
active after I umount my ocfs2 volume.

/etc/init.d/o2cb stop
Stopping O2CB cluster CLUST: Failed
Unable to stop cluster as heartbeat region still active

ocfs2_hb_ctl -I -d /dev/mapper/volgr1-lvol0
0C4AB55FE9314FA5A9F81652FDB9B22D: 1 refs

After i manually kill the ref (ocfs2_hb_ctl -K -d 
/dev/mapper/volgr1-lvol0 ocfs2 ) i can stop successfully o2cb. I can 
live with that but why doesn't it stop automatically? As i 
understand, hearbeat should be started and stopped once the volume 
gets mounted/umounted.


br,
Laurentiu.

On 10/19/2011 02:28, Sunil Mushran wrote:

Manual delete will only work if there are no references. In your case
there are references.

You may want to start both nodes from scratch. Do not start/stop
heartbeat manually. Also, do not force-format.

On 10/18/2011 03:54 PM, Laurentiu Gosu wrote:
OK, i rebooted one of the nodes(both had similar issues); . But 
something is still fishy.

- i mounted the device: mount -t ocfs2 /dev/volgr1/lvol0 /mnt/tmp/
- i unmount it: umount /mnt/tmp/
- tried to stop o2cb:  /etc/init.d/o2cb stop
Stopping O2CB cluster CLUSTER: Failed
Unable to stop cluster as heartbeat region still active
- ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
0C4AB55FE9314FA5A9F81652FDB9B22D: 1 refs
-  ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat
- ls -Rl /sys/kernel/config/cluster/CLUSTER/heartbeat/
/sys/kernel/config/cluster/CLUSTER/heartbeat/:
total 0
drwxr-xr-x 2 root root0 Oct 19 01:50 
0C4AB55FE9314FA5A9F81652FDB9B22D

-rw-r--r-- 1 root root 4096 Oct 19 01:40 dead_threshold

/sys/kernel/config/cluster/CLUSTER/heartbeat/0C4AB55FE9314FA5A9F81652FDB9B22D:
total 0
-rw-r--r-- 1 root root 4096 Oct 19 01:50 block_bytes
-rw-r--r-- 1 root root 4096 Oct 19 01:50 blocks
-rw-r--r-- 1 root root 4096 Oct 19 01:50 dev
-r--r--r-- 1 root root 4096 Oct 19 01:50 pid
-rw-r--r-- 1 root root 4096 Oct 19 01:50 start_block

- i cannot manually delete 
/sys/kernel/config/cluster/CLUSTER/heartbeat/0C4AB55FE9314FA5A9F81652FDB9B22D/


PS: i'm going to sleep now, i have to be up in a few hours. We can 
continue tomorrow if it's ok with you.

Thank you for your help.

Laurentiu.

On 10/19/2011 01:33, Sunil Mushran wrote:
One way this can happen is if one starts the hb manually and then 
force
formats on that volume. The format will generate a new uuid. Once 
that
happens, the hb tool cannot map the region to the device and thus 
fail

to stop it. Right now the easiest option on this box is resetting it.

On 10/18/2011 03:24 PM, Laurentiu Gosu wrote:
Yes, i did reformat it(even more than once i think, last week). 
This is a pre-production system and i'm trying various options 
before moving into real life.



On 10/19/2011 01:19, Sunil Mushran wrote:

Did you reformat the volume recently? or, when did you format last?

On 10/18/2011 03:13 PM, Laurentiu Gosu wrote:

well..this is weird
ls /sys/kernel/config/cluster/CLUSTER/heartbeat/
*918673F06F8F4ED188DDCE14F39945F6*  dead_threshold

looks like we have different UUIDs. Where is this coming from??

ocfs2_hb_ctl -I -u 918673F06F8F4ED188DDCE14F39945F6
918673F06F8F4ED188DDCE14F39945F6: 1 refs


On 10/19/2011 01:04, Sunil Mushran wrote:

Let's do it by hand.
rm -rf 
/sys/kernel/config/cluster/.../heartbeat/*0C4AB55FE9314FA5A9F81652FDB9B22D 
*


On 10/18/2011 02:52 PM, Laurentiu Gosu wrote:

 ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping 
heartbeat


No improvment :(


On 10/19/2011 00:50, Sunil Mushran wrote:

See if this cleans it up.
ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D

On 10/18/2011 02:44 PM, Laurentiu Gosu wrote:

ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
0C4AB55FE9314FA5A9F81652FDB9B22D: 0 refs


On 10/19/2011 00:43, Sunil Mushran wrote:

ocfs2_hb_ctl -l -u 0C4AB55FE9314FA5A9F81652FDB9B22D

On 10/18/2011 02:40 PM, Laurentiu Gosu wrote:

mounted.ocfs2 -d
DeviceFS Stack  
UUID  Label
/dev/mapper/volgr1-lvol0  ocfs2  o2cb

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-23 Thread Laurentiu Gosu

hmm..
#ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
0C4AB55FE9314FA5A9F81652FDB9B22D: 1 refs
*BUT:*
#ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D ocfs2
ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat
I can still kill the ref using device name (-d).

On 10/23/2011 17:57, Sunil Mushran wrote:

I think it stops by uuid. So try doing this the next time.
You are encountering some issue that we have not seen before.
ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D ocfs2

On 10/23/2011 05:32 AM, Laurentiu Gosu wrote:

Hi Sunil,
Sorry for my late reply, i just had time today to start from scratch 
and test.
I rebuilt my environment(2 nodes connected to a SAN via 
iSCSI+multipath). I still have the issue that the heartbeat is active 
after I umount my ocfs2 volume.

/etc/init.d/o2cb stop
Stopping O2CB cluster CLUST: Failed
Unable to stop cluster as heartbeat region still active

ocfs2_hb_ctl -I -d /dev/mapper/volgr1-lvol0
0C4AB55FE9314FA5A9F81652FDB9B22D: 1 refs

After i manually kill the ref (ocfs2_hb_ctl -K -d 
/dev/mapper/volgr1-lvol0 ocfs2 ) i can stop successfully o2cb. I can 
live with that but why doesn't it stop automatically? As i 
understand, hearbeat should be started and stopped once the volume 
gets mounted/umounted.


br,
Laurentiu.

On 10/19/2011 02:28, Sunil Mushran wrote:

Manual delete will only work if there are no references. In your case
there are references.

You may want to start both nodes from scratch. Do not start/stop
heartbeat manually. Also, do not force-format.

On 10/18/2011 03:54 PM, Laurentiu Gosu wrote:
OK, i rebooted one of the nodes(both had similar issues); . But 
something is still fishy.

- i mounted the device: mount -t ocfs2 /dev/volgr1/lvol0 /mnt/tmp/
- i unmount it: umount /mnt/tmp/
- tried to stop o2cb:  /etc/init.d/o2cb stop
Stopping O2CB cluster CLUSTER: Failed
Unable to stop cluster as heartbeat region still active
- ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
0C4AB55FE9314FA5A9F81652FDB9B22D: 1 refs
-  ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat
- ls -Rl /sys/kernel/config/cluster/CLUSTER/heartbeat/
/sys/kernel/config/cluster/CLUSTER/heartbeat/:
total 0
drwxr-xr-x 2 root root0 Oct 19 01:50 
0C4AB55FE9314FA5A9F81652FDB9B22D

-rw-r--r-- 1 root root 4096 Oct 19 01:40 dead_threshold

/sys/kernel/config/cluster/CLUSTER/heartbeat/0C4AB55FE9314FA5A9F81652FDB9B22D:
total 0
-rw-r--r-- 1 root root 4096 Oct 19 01:50 block_bytes
-rw-r--r-- 1 root root 4096 Oct 19 01:50 blocks
-rw-r--r-- 1 root root 4096 Oct 19 01:50 dev
-r--r--r-- 1 root root 4096 Oct 19 01:50 pid
-rw-r--r-- 1 root root 4096 Oct 19 01:50 start_block

- i cannot manually delete 
/sys/kernel/config/cluster/CLUSTER/heartbeat/0C4AB55FE9314FA5A9F81652FDB9B22D/


PS: i'm going to sleep now, i have to be up in a few hours. We can 
continue tomorrow if it's ok with you.

Thank you for your help.

Laurentiu.

On 10/19/2011 01:33, Sunil Mushran wrote:
One way this can happen is if one starts the hb manually and then 
force

formats on that volume. The format will generate a new uuid. Once that
happens, the hb tool cannot map the region to the device and thus fail
to stop it. Right now the easiest option on this box is resetting it.

On 10/18/2011 03:24 PM, Laurentiu Gosu wrote:
Yes, i did reformat it(even more than once i think, last week). 
This is a pre-production system and i'm trying various options 
before moving into real life.



On 10/19/2011 01:19, Sunil Mushran wrote:

Did you reformat the volume recently? or, when did you format last?

On 10/18/2011 03:13 PM, Laurentiu Gosu wrote:

well..this is weird
ls /sys/kernel/config/cluster/CLUSTER/heartbeat/
*918673F06F8F4ED188DDCE14F39945F6*  dead_threshold

looks like we have different UUIDs. Where is this coming from??

ocfs2_hb_ctl -I -u 918673F06F8F4ED188DDCE14F39945F6
918673F06F8F4ED188DDCE14F39945F6: 1 refs


On 10/19/2011 01:04, Sunil Mushran wrote:

Let's do it by hand.
rm -rf 
/sys/kernel/config/cluster/.../heartbeat/*0C4AB55FE9314FA5A9F81652FDB9B22D 
*


On 10/18/2011 02:52 PM, Laurentiu Gosu wrote:

 ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping 
heartbeat


No improvment :(


On 10/19/2011 00:50, Sunil Mushran wrote:

See if this cleans it up.
ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D

On 10/18/2011 02:44 PM, Laurentiu Gosu wrote:

ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
0C4AB55FE9314FA5A9F81652FDB9B22D: 0 refs


On 10/19/2011 00:43, Sunil Mushran wrote:

ocfs2_hb_ctl -l -u 0C4AB55FE9314FA5A9F81652FDB9B22D

On 10/18/2011 02:40 PM, Laurentiu Gosu wrote:

mounted.ocfs2 -d
DeviceFS Stack  
UUID  Label
/dev/mapper/volgr1-lvol0  ocfs2  o2cb   
0C4AB55FE9314FA5A9F81652FDB9B22D  ocfs2


mounted.ocfs2 -f
DeviceFS Nodes
/dev/mapper/volgr1-lvol0  ocfs2  ro02xsrv001

ro02xsrv001 = the oth

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-23 Thread Sunil Mushran

I think it stops by uuid. So try doing this the next time.
You are encountering some issue that we have not seen before.
ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D ocfs2

On 10/23/2011 05:32 AM, Laurentiu Gosu wrote:

Hi Sunil,
Sorry for my late reply, i just had time today to start from scratch 
and test.
I rebuilt my environment(2 nodes connected to a SAN via 
iSCSI+multipath). I still have the issue that the heartbeat is active 
after I umount my ocfs2 volume.

/etc/init.d/o2cb stop
Stopping O2CB cluster CLUST: Failed
Unable to stop cluster as heartbeat region still active

ocfs2_hb_ctl -I -d /dev/mapper/volgr1-lvol0
0C4AB55FE9314FA5A9F81652FDB9B22D: 1 refs

After i manually kill the ref (ocfs2_hb_ctl -K -d 
/dev/mapper/volgr1-lvol0 ocfs2 ) i can stop successfully o2cb. I can 
live with that but why doesn't it stop automatically? As i understand, 
hearbeat should be started and stopped once the volume gets 
mounted/umounted.


br,
Laurentiu.

On 10/19/2011 02:28, Sunil Mushran wrote:

Manual delete will only work if there are no references. In your case
there are references.

You may want to start both nodes from scratch. Do not start/stop
heartbeat manually. Also, do not force-format.

On 10/18/2011 03:54 PM, Laurentiu Gosu wrote:
OK, i rebooted one of the nodes(both had similar issues); . But 
something is still fishy.

- i mounted the device: mount -t ocfs2 /dev/volgr1/lvol0 /mnt/tmp/
- i unmount it: umount /mnt/tmp/
- tried to stop o2cb:  /etc/init.d/o2cb stop
Stopping O2CB cluster CLUSTER: Failed
Unable to stop cluster as heartbeat region still active
- ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
0C4AB55FE9314FA5A9F81652FDB9B22D: 1 refs
-  ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat
- ls -Rl /sys/kernel/config/cluster/CLUSTER/heartbeat/
/sys/kernel/config/cluster/CLUSTER/heartbeat/:
total 0
drwxr-xr-x 2 root root0 Oct 19 01:50 
0C4AB55FE9314FA5A9F81652FDB9B22D

-rw-r--r-- 1 root root 4096 Oct 19 01:40 dead_threshold

/sys/kernel/config/cluster/CLUSTER/heartbeat/0C4AB55FE9314FA5A9F81652FDB9B22D:
total 0
-rw-r--r-- 1 root root 4096 Oct 19 01:50 block_bytes
-rw-r--r-- 1 root root 4096 Oct 19 01:50 blocks
-rw-r--r-- 1 root root 4096 Oct 19 01:50 dev
-r--r--r-- 1 root root 4096 Oct 19 01:50 pid
-rw-r--r-- 1 root root 4096 Oct 19 01:50 start_block

- i cannot manually delete 
/sys/kernel/config/cluster/CLUSTER/heartbeat/0C4AB55FE9314FA5A9F81652FDB9B22D/


PS: i'm going to sleep now, i have to be up in a few hours. We can 
continue tomorrow if it's ok with you.

Thank you for your help.

Laurentiu.

On 10/19/2011 01:33, Sunil Mushran wrote:

One way this can happen is if one starts the hb manually and then force
formats on that volume. The format will generate a new uuid. Once that
happens, the hb tool cannot map the region to the device and thus fail
to stop it. Right now the easiest option on this box is resetting it.

On 10/18/2011 03:24 PM, Laurentiu Gosu wrote:
Yes, i did reformat it(even more than once i think, last week). 
This is a pre-production system and i'm trying various options 
before moving into real life.



On 10/19/2011 01:19, Sunil Mushran wrote:

Did you reformat the volume recently? or, when did you format last?

On 10/18/2011 03:13 PM, Laurentiu Gosu wrote:

well..this is weird
ls /sys/kernel/config/cluster/CLUSTER/heartbeat/
*918673F06F8F4ED188DDCE14F39945F6*  dead_threshold

looks like we have different UUIDs. Where is this coming from??

ocfs2_hb_ctl -I -u 918673F06F8F4ED188DDCE14F39945F6
918673F06F8F4ED188DDCE14F39945F6: 1 refs


On 10/19/2011 01:04, Sunil Mushran wrote:

Let's do it by hand.
rm -rf 
/sys/kernel/config/cluster/.../heartbeat/*0C4AB55FE9314FA5A9F81652FDB9B22D 
*


On 10/18/2011 02:52 PM, Laurentiu Gosu wrote:

 ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping 
heartbeat


No improvment :(


On 10/19/2011 00:50, Sunil Mushran wrote:

See if this cleans it up.
ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D

On 10/18/2011 02:44 PM, Laurentiu Gosu wrote:

ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
0C4AB55FE9314FA5A9F81652FDB9B22D: 0 refs


On 10/19/2011 00:43, Sunil Mushran wrote:

ocfs2_hb_ctl -l -u 0C4AB55FE9314FA5A9F81652FDB9B22D

On 10/18/2011 02:40 PM, Laurentiu Gosu wrote:

mounted.ocfs2 -d
DeviceFS Stack  
UUID  Label
/dev/mapper/volgr1-lvol0  ocfs2  o2cb   
0C4AB55FE9314FA5A9F81652FDB9B22D  ocfs2


mounted.ocfs2 -f
DeviceFS Nodes
/dev/mapper/volgr1-lvol0  ocfs2  ro02xsrv001

ro02xsrv001 = the other node in the cluster.

By the way, there is no /dev/md-2
 ls /dev/dm-*
/dev/dm-0  /dev/dm-1


On 10/19/2011 00:37, Sunil Mushran wrote:

So it is not mounted. But we still have a hb thread because
hb could not be stopped during umount. The reason for that
could be the same that causes ocfs2_hb_ctl to fail.

Do:
mounted.ocfs2 -

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-23 Thread Laurentiu Gosu

Hi Sunil,
Sorry for my late reply, i just had time today to start from scratch and 
test.
I rebuilt my environment(2 nodes connected to a SAN via 
iSCSI+multipath). I still have the issue that the heartbeat is active 
after I umount my ocfs2 volume.

/etc/init.d/o2cb stop
Stopping O2CB cluster CLUST: Failed
Unable to stop cluster as heartbeat region still active

ocfs2_hb_ctl -I -d /dev/mapper/volgr1-lvol0
0C4AB55FE9314FA5A9F81652FDB9B22D: 1 refs

After i manually kill the ref (ocfs2_hb_ctl -K -d 
/dev/mapper/volgr1-lvol0 ocfs2 ) i can stop successfully o2cb. I can 
live with that but why doesn't it stop automatically? As i understand, 
hearbeat should be started and stopped once the volume gets 
mounted/umounted.


br,
Laurentiu.

On 10/19/2011 02:28, Sunil Mushran wrote:

Manual delete will only work if there are no references. In your case
there are references.

You may want to start both nodes from scratch. Do not start/stop
heartbeat manually. Also, do not force-format.

On 10/18/2011 03:54 PM, Laurentiu Gosu wrote:
OK, i rebooted one of the nodes(both had similar issues); . But 
something is still fishy.

- i mounted the device: mount -t ocfs2 /dev/volgr1/lvol0 /mnt/tmp/
- i unmount it: umount /mnt/tmp/
- tried to stop o2cb:  /etc/init.d/o2cb stop
Stopping O2CB cluster CLUSTER: Failed
Unable to stop cluster as heartbeat region still active
- ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
0C4AB55FE9314FA5A9F81652FDB9B22D: 1 refs
-  ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat
- ls -Rl /sys/kernel/config/cluster/CLUSTER/heartbeat/
/sys/kernel/config/cluster/CLUSTER/heartbeat/:
total 0
drwxr-xr-x 2 root root0 Oct 19 01:50 0C4AB55FE9314FA5A9F81652FDB9B22D
-rw-r--r-- 1 root root 4096 Oct 19 01:40 dead_threshold

/sys/kernel/config/cluster/CLUSTER/heartbeat/0C4AB55FE9314FA5A9F81652FDB9B22D:
total 0
-rw-r--r-- 1 root root 4096 Oct 19 01:50 block_bytes
-rw-r--r-- 1 root root 4096 Oct 19 01:50 blocks
-rw-r--r-- 1 root root 4096 Oct 19 01:50 dev
-r--r--r-- 1 root root 4096 Oct 19 01:50 pid
-rw-r--r-- 1 root root 4096 Oct 19 01:50 start_block

- i cannot manually delete 
/sys/kernel/config/cluster/CLUSTER/heartbeat/0C4AB55FE9314FA5A9F81652FDB9B22D/


PS: i'm going to sleep now, i have to be up in a few hours. We can 
continue tomorrow if it's ok with you.

Thank you for your help.

Laurentiu.

On 10/19/2011 01:33, Sunil Mushran wrote:

One way this can happen is if one starts the hb manually and then force
formats on that volume. The format will generate a new uuid. Once that
happens, the hb tool cannot map the region to the device and thus fail
to stop it. Right now the easiest option on this box is resetting it.

On 10/18/2011 03:24 PM, Laurentiu Gosu wrote:
Yes, i did reformat it(even more than once i think, last week). 
This is a pre-production system and i'm trying various options 
before moving into real life.



On 10/19/2011 01:19, Sunil Mushran wrote:

Did you reformat the volume recently? or, when did you format last?

On 10/18/2011 03:13 PM, Laurentiu Gosu wrote:

well..this is weird
ls /sys/kernel/config/cluster/CLUSTER/heartbeat/
*918673F06F8F4ED188DDCE14F39945F6*  dead_threshold

looks like we have different UUIDs. Where is this coming from??

ocfs2_hb_ctl -I -u 918673F06F8F4ED188DDCE14F39945F6
918673F06F8F4ED188DDCE14F39945F6: 1 refs


On 10/19/2011 01:04, Sunil Mushran wrote:

Let's do it by hand.
rm -rf 
/sys/kernel/config/cluster/.../heartbeat/*0C4AB55FE9314FA5A9F81652FDB9B22D 
*


On 10/18/2011 02:52 PM, Laurentiu Gosu wrote:

 ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping 
heartbeat


No improvment :(


On 10/19/2011 00:50, Sunil Mushran wrote:

See if this cleans it up.
ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D

On 10/18/2011 02:44 PM, Laurentiu Gosu wrote:

ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
0C4AB55FE9314FA5A9F81652FDB9B22D: 0 refs


On 10/19/2011 00:43, Sunil Mushran wrote:

ocfs2_hb_ctl -l -u 0C4AB55FE9314FA5A9F81652FDB9B22D

On 10/18/2011 02:40 PM, Laurentiu Gosu wrote:

mounted.ocfs2 -d
DeviceFS Stack  
UUID  Label
/dev/mapper/volgr1-lvol0  ocfs2  o2cb   
0C4AB55FE9314FA5A9F81652FDB9B22D  ocfs2


mounted.ocfs2 -f
DeviceFS Nodes
/dev/mapper/volgr1-lvol0  ocfs2  ro02xsrv001

ro02xsrv001 = the other node in the cluster.

By the way, there is no /dev/md-2
 ls /dev/dm-*
/dev/dm-0  /dev/dm-1


On 10/19/2011 00:37, Sunil Mushran wrote:

So it is not mounted. But we still have a hb thread because
hb could not be stopped during umount. The reason for that
could be the same that causes ocfs2_hb_ctl to fail.

Do:
mounted.ocfs2 -d

On 10/18/2011 02:32 PM, Laurentiu Gosu wrote:

ls -lR /sys/kernel/debug/ocfs2
/sys/kernel/debug/ocfs2:
total 0

ls -lR /sys/kernel/debug/o2dlm
/sys/kernel/debug/o2dlm:
total 0

ocfs2_hb_ctl -I -d /dev/dm-2
ocfs2_hb_ctl: Device

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Sunil Mushran

Manual delete will only work if there are no references. In your case
there are references.

You may want to start both nodes from scratch. Do not start/stop
heartbeat manually. Also, do not force-format.

On 10/18/2011 03:54 PM, Laurentiu Gosu wrote:

OK, i rebooted one of the nodes(both had similar issues); . But something is 
still fishy.
- i mounted the device: mount -t ocfs2 /dev/volgr1/lvol0 /mnt/tmp/
- i unmount it: umount /mnt/tmp/
- tried to stop o2cb:  /etc/init.d/o2cb stop
Stopping O2CB cluster CLUSTER: Failed
Unable to stop cluster as heartbeat region still active
- ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
0C4AB55FE9314FA5A9F81652FDB9B22D: 1 refs
-  ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat
- ls -Rl /sys/kernel/config/cluster/CLUSTER/heartbeat/
/sys/kernel/config/cluster/CLUSTER/heartbeat/:
total 0
drwxr-xr-x 2 root root0 Oct 19 01:50 0C4AB55FE9314FA5A9F81652FDB9B22D
-rw-r--r-- 1 root root 4096 Oct 19 01:40 dead_threshold

/sys/kernel/config/cluster/CLUSTER/heartbeat/0C4AB55FE9314FA5A9F81652FDB9B22D:
total 0
-rw-r--r-- 1 root root 4096 Oct 19 01:50 block_bytes
-rw-r--r-- 1 root root 4096 Oct 19 01:50 blocks
-rw-r--r-- 1 root root 4096 Oct 19 01:50 dev
-r--r--r-- 1 root root 4096 Oct 19 01:50 pid
-rw-r--r-- 1 root root 4096 Oct 19 01:50 start_block

- i cannot manually delete 
/sys/kernel/config/cluster/CLUSTER/heartbeat/0C4AB55FE9314FA5A9F81652FDB9B22D/

PS: i'm going to sleep now, i have to be up in a few hours. We can continue 
tomorrow if it's ok with you.
Thank you for your help.

Laurentiu.

On 10/19/2011 01:33, Sunil Mushran wrote:

One way this can happen is if one starts the hb manually and then force
formats on that volume. The format will generate a new uuid. Once that
happens, the hb tool cannot map the region to the device and thus fail
to stop it. Right now the easiest option on this box is resetting it.

On 10/18/2011 03:24 PM, Laurentiu Gosu wrote:

Yes, i did reformat it(even more than once i think, last week). This is a 
pre-production system and i'm trying various options before moving into real 
life.


On 10/19/2011 01:19, Sunil Mushran wrote:

Did you reformat the volume recently? or, when did you format last?

On 10/18/2011 03:13 PM, Laurentiu Gosu wrote:

well..this is weird
ls /sys/kernel/config/cluster/CLUSTER/heartbeat/
*918673F06F8F4ED188DDCE14F39945F6*  dead_threshold

looks like we have different UUIDs. Where is this coming from??

ocfs2_hb_ctl -I -u 918673F06F8F4ED188DDCE14F39945F6
918673F06F8F4ED188DDCE14F39945F6: 1 refs


On 10/19/2011 01:04, Sunil Mushran wrote:

Let's do it by hand.
rm -rf 
/sys/kernel/config/cluster/.../heartbeat/*0C4AB55FE9314FA5A9F81652FDB9B22D *

On 10/18/2011 02:52 PM, Laurentiu Gosu wrote:

 ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat

No improvment :(


On 10/19/2011 00:50, Sunil Mushran wrote:

See if this cleans it up.
ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D

On 10/18/2011 02:44 PM, Laurentiu Gosu wrote:

ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
0C4AB55FE9314FA5A9F81652FDB9B22D: 0 refs


On 10/19/2011 00:43, Sunil Mushran wrote:

ocfs2_hb_ctl -l -u 0C4AB55FE9314FA5A9F81652FDB9B22D

On 10/18/2011 02:40 PM, Laurentiu Gosu wrote:

mounted.ocfs2 -d
DeviceFS Stack  UUID  Label
/dev/mapper/volgr1-lvol0  ocfs2  o2cb   0C4AB55FE9314FA5A9F81652FDB9B22D  ocfs2

mounted.ocfs2 -f
DeviceFS Nodes
/dev/mapper/volgr1-lvol0  ocfs2  ro02xsrv001

ro02xsrv001 = the other node in the cluster.

By the way, there is no /dev/md-2
 ls /dev/dm-*
/dev/dm-0  /dev/dm-1


On 10/19/2011 00:37, Sunil Mushran wrote:

So it is not mounted. But we still have a hb thread because
hb could not be stopped during umount. The reason for that
could be the same that causes ocfs2_hb_ctl to fail.

Do:
mounted.ocfs2 -d

On 10/18/2011 02:32 PM, Laurentiu Gosu wrote:

ls -lR /sys/kernel/debug/ocfs2
/sys/kernel/debug/ocfs2:
total 0

ls -lR /sys/kernel/debug/o2dlm
/sys/kernel/debug/o2dlm:
total 0

ocfs2_hb_ctl -I -d /dev/dm-2
ocfs2_hb_ctl: Device name specified was not found while reading uuid

There is no /dev/dm-2 mounted.


On 10/19/2011 00:27, Sunil Mushran wrote:

mount -t debugfs debugfs /sys/kernel/debug

Then list that dir.

Also, do:
ocfs2_hb_ctl -l -d /dev/dm-2

Be careful before killing. We want to be sure that dev is not mounted.

On 10/18/2011 02:23 PM, Laurentiu Gosu wrote:

Again   the outputs:
 cat 
/sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
dm-2
--->here should be volgr1-lvol0 i guess?

ls -lR /sys/kernel/debug/ocfs2
ls: /sys/kernel/debug/ocfs2: No such file or directory

ls -lR /sys/kernel/debug/o2dlm
ls: /sys/kernel/debug/o2dlm: No such file or directory

I think i have to enable debug first somehow..?

Laurentiu.

On 10/19/2011 00:17, Sunil Mushran wrote:

What does 

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Laurentiu Gosu
OK, i rebooted one of the nodes(both had similar issues); . But 
something is still fishy.

- i mounted the device: mount -t ocfs2 /dev/volgr1/lvol0 /mnt/tmp/
- i unmount it: umount /mnt/tmp/
- tried to stop o2cb:  /etc/init.d/o2cb stop
Stopping O2CB cluster CLUSTER: Failed
Unable to stop cluster as heartbeat region still active
- ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
0C4AB55FE9314FA5A9F81652FDB9B22D: 1 refs
-  ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat
- ls -Rl /sys/kernel/config/cluster/CLUSTER/heartbeat/
/sys/kernel/config/cluster/CLUSTER/heartbeat/:
total 0
drwxr-xr-x 2 root root0 Oct 19 01:50 0C4AB55FE9314FA5A9F81652FDB9B22D
-rw-r--r-- 1 root root 4096 Oct 19 01:40 dead_threshold

/sys/kernel/config/cluster/CLUSTER/heartbeat/0C4AB55FE9314FA5A9F81652FDB9B22D:
total 0
-rw-r--r-- 1 root root 4096 Oct 19 01:50 block_bytes
-rw-r--r-- 1 root root 4096 Oct 19 01:50 blocks
-rw-r--r-- 1 root root 4096 Oct 19 01:50 dev
-r--r--r-- 1 root root 4096 Oct 19 01:50 pid
-rw-r--r-- 1 root root 4096 Oct 19 01:50 start_block

- i cannot manually delete 
/sys/kernel/config/cluster/CLUSTER/heartbeat/0C4AB55FE9314FA5A9F81652FDB9B22D/


PS: i'm going to sleep now, i have to be up in a few hours. We can 
continue tomorrow if it's ok with you.

Thank you for your help.

Laurentiu.

On 10/19/2011 01:33, Sunil Mushran wrote:

One way this can happen is if one starts the hb manually and then force
formats on that volume. The format will generate a new uuid. Once that
happens, the hb tool cannot map the region to the device and thus fail
to stop it. Right now the easiest option on this box is resetting it.

On 10/18/2011 03:24 PM, Laurentiu Gosu wrote:
Yes, i did reformat it(even more than once i think, last week). This 
is a pre-production system and i'm trying various options before 
moving into real life.



On 10/19/2011 01:19, Sunil Mushran wrote:

Did you reformat the volume recently? or, when did you format last?

On 10/18/2011 03:13 PM, Laurentiu Gosu wrote:

well..this is weird
ls /sys/kernel/config/cluster/CLUSTER/heartbeat/
*918673F06F8F4ED188DDCE14F39945F6*  dead_threshold

looks like we have different UUIDs. Where is this coming from??

ocfs2_hb_ctl -I -u 918673F06F8F4ED188DDCE14F39945F6
918673F06F8F4ED188DDCE14F39945F6: 1 refs


On 10/19/2011 01:04, Sunil Mushran wrote:

Let's do it by hand.
rm -rf 
/sys/kernel/config/cluster/.../heartbeat/*0C4AB55FE9314FA5A9F81652FDB9B22D 
*


On 10/18/2011 02:52 PM, Laurentiu Gosu wrote:

 ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping 
heartbeat


No improvment :(


On 10/19/2011 00:50, Sunil Mushran wrote:

See if this cleans it up.
ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D

On 10/18/2011 02:44 PM, Laurentiu Gosu wrote:

ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
0C4AB55FE9314FA5A9F81652FDB9B22D: 0 refs


On 10/19/2011 00:43, Sunil Mushran wrote:

ocfs2_hb_ctl -l -u 0C4AB55FE9314FA5A9F81652FDB9B22D

On 10/18/2011 02:40 PM, Laurentiu Gosu wrote:

mounted.ocfs2 -d
DeviceFS Stack  
UUID  Label
/dev/mapper/volgr1-lvol0  ocfs2  o2cb   
0C4AB55FE9314FA5A9F81652FDB9B22D  ocfs2


mounted.ocfs2 -f
DeviceFS Nodes
/dev/mapper/volgr1-lvol0  ocfs2  ro02xsrv001

ro02xsrv001 = the other node in the cluster.

By the way, there is no /dev/md-2
 ls /dev/dm-*
/dev/dm-0  /dev/dm-1


On 10/19/2011 00:37, Sunil Mushran wrote:

So it is not mounted. But we still have a hb thread because
hb could not be stopped during umount. The reason for that
could be the same that causes ocfs2_hb_ctl to fail.

Do:
mounted.ocfs2 -d

On 10/18/2011 02:32 PM, Laurentiu Gosu wrote:

ls -lR /sys/kernel/debug/ocfs2
/sys/kernel/debug/ocfs2:
total 0

ls -lR /sys/kernel/debug/o2dlm
/sys/kernel/debug/o2dlm:
total 0

ocfs2_hb_ctl -I -d /dev/dm-2
ocfs2_hb_ctl: Device name specified was not found while 
reading uuid


There is no /dev/dm-2 mounted.


On 10/19/2011 00:27, Sunil Mushran wrote:

mount -t debugfs debugfs /sys/kernel/debug

Then list that dir.

Also, do:
ocfs2_hb_ctl -l -d /dev/dm-2

Be careful before killing. We want to be sure that dev is 
not mounted.


On 10/18/2011 02:23 PM, Laurentiu Gosu wrote:

Again   the outputs:
 cat 
/sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev

dm-2
--->here should be volgr1-lvol0 i guess?

ls -lR /sys/kernel/debug/ocfs2
ls: /sys/kernel/debug/ocfs2: No such file or directory

ls -lR /sys/kernel/debug/o2dlm
ls: /sys/kernel/debug/o2dlm: No such file or directory

I think i have to enable debug first somehow..?

Laurentiu.

On 10/19/2011 00:17, Sunil Mushran wrote:

What does this return?
cat 
/sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev


Also, do:
ls -lR /sys/kernel/debug/ocfs2
ls -lR /sys/kernel/debug/o2dlm

On 10/18/2011 02:14 PM, Laurentiu Gosu wrote:

Here is the outpu

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Sunil Mushran

One way this can happen is if one starts the hb manually and then force
formats on that volume. The format will generate a new uuid. Once that
happens, the hb tool cannot map the region to the device and thus fail
to stop it. Right now the easiest option on this box is resetting it.

On 10/18/2011 03:24 PM, Laurentiu Gosu wrote:

Yes, i did reformat it(even more than once i think, last week). This is a 
pre-production system and i'm trying various options before moving into real 
life.


On 10/19/2011 01:19, Sunil Mushran wrote:

Did you reformat the volume recently? or, when did you format last?

On 10/18/2011 03:13 PM, Laurentiu Gosu wrote:

well..this is weird
ls /sys/kernel/config/cluster/CLUSTER/heartbeat/
*918673F06F8F4ED188DDCE14F39945F6*  dead_threshold

looks like we have different UUIDs. Where is this coming from??

ocfs2_hb_ctl -I -u 918673F06F8F4ED188DDCE14F39945F6
918673F06F8F4ED188DDCE14F39945F6: 1 refs


On 10/19/2011 01:04, Sunil Mushran wrote:

Let's do it by hand.
rm -rf 
/sys/kernel/config/cluster/.../heartbeat/*0C4AB55FE9314FA5A9F81652FDB9B22D *

On 10/18/2011 02:52 PM, Laurentiu Gosu wrote:

 ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat

No improvment :(


On 10/19/2011 00:50, Sunil Mushran wrote:

See if this cleans it up.
ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D

On 10/18/2011 02:44 PM, Laurentiu Gosu wrote:

ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
0C4AB55FE9314FA5A9F81652FDB9B22D: 0 refs


On 10/19/2011 00:43, Sunil Mushran wrote:

ocfs2_hb_ctl -l -u 0C4AB55FE9314FA5A9F81652FDB9B22D

On 10/18/2011 02:40 PM, Laurentiu Gosu wrote:

mounted.ocfs2 -d
DeviceFS Stack  UUID  Label
/dev/mapper/volgr1-lvol0  ocfs2  o2cb   0C4AB55FE9314FA5A9F81652FDB9B22D  ocfs2

mounted.ocfs2 -f
DeviceFS Nodes
/dev/mapper/volgr1-lvol0  ocfs2  ro02xsrv001

ro02xsrv001 = the other node in the cluster.

By the way, there is no /dev/md-2
 ls /dev/dm-*
/dev/dm-0  /dev/dm-1


On 10/19/2011 00:37, Sunil Mushran wrote:

So it is not mounted. But we still have a hb thread because
hb could not be stopped during umount. The reason for that
could be the same that causes ocfs2_hb_ctl to fail.

Do:
mounted.ocfs2 -d

On 10/18/2011 02:32 PM, Laurentiu Gosu wrote:

ls -lR /sys/kernel/debug/ocfs2
/sys/kernel/debug/ocfs2:
total 0

ls -lR /sys/kernel/debug/o2dlm
/sys/kernel/debug/o2dlm:
total 0

ocfs2_hb_ctl -I -d /dev/dm-2
ocfs2_hb_ctl: Device name specified was not found while reading uuid

There is no /dev/dm-2 mounted.


On 10/19/2011 00:27, Sunil Mushran wrote:

mount -t debugfs debugfs /sys/kernel/debug

Then list that dir.

Also, do:
ocfs2_hb_ctl -l -d /dev/dm-2

Be careful before killing. We want to be sure that dev is not mounted.

On 10/18/2011 02:23 PM, Laurentiu Gosu wrote:

Again   the outputs:
 cat 
/sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
dm-2
--->here should be volgr1-lvol0 i guess?

ls -lR /sys/kernel/debug/ocfs2
ls: /sys/kernel/debug/ocfs2: No such file or directory

ls -lR /sys/kernel/debug/o2dlm
ls: /sys/kernel/debug/o2dlm: No such file or directory

I think i have to enable debug first somehow..?

Laurentiu.

On 10/19/2011 00:17, Sunil Mushran wrote:

What does this return?
cat 
/sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev

Also, do:
ls -lR /sys/kernel/debug/ocfs2
ls -lR /sys/kernel/debug/o2dlm

On 10/18/2011 02:14 PM, Laurentiu Gosu wrote:

Here is the output:

ls -lR /sys/kernel/config/cluster
/sys/kernel/config/cluster:
total 0
drwxr-xr-x 4 root root 0 Oct 19 00:12 CLUSTER

/sys/kernel/config/cluster/CLUSTER:
total 0
-rw-r--r-- 1 root root 4096 Oct 19 00:12 fence_method
drwxr-xr-x 3 root root0 Oct 19 00:12 heartbeat
-rw-r--r-- 1 root root 4096 Oct 19 00:12 idle_timeout_ms
-rw-r--r-- 1 root root 4096 Oct 19 00:12 keepalive_delay_ms
drwxr-xr-x 4 root root0 Oct 11 20:23 node
-rw-r--r-- 1 root root 4096 Oct 19 00:12 reconnect_delay_ms

/sys/kernel/config/cluster/CLUSTER/heartbeat:
total 0
drwxr-xr-x 2 root root0 Oct 19 00:12 918673F06F8F4ED188DDCE14F39945F6
-rw-r--r-- 1 root root 4096 Oct 19 00:12 dead_threshold

/sys/kernel/config/cluster/CLUSTER/heartbeat/*918673F06F8F4ED188DDCE14F39945F6*:
total 0
-rw-r--r-- 1 root root 4096 Oct 19 00:12 block_bytes
-rw-r--r-- 1 root root 4096 Oct 19 00:12 blocks
-rw-r--r-- 1 root root 4096 Oct 19 00:12 dev
-r--r--r-- 1 root root 4096 Oct 19 00:12 pid
-rw-r--r-- 1 root root 4096 Oct 19 00:12 start_block

/sys/kernel/config/cluster/CLUSTER/node:
total 0
drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv001
drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv002

/sys/kernel/config/cluster/CLUSTER/node/ro02xsrv001:
total 0
-rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
-rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
-rw-r--r-- 1 root root 4096 Oct 19 00:12 local
-rw-r--r-- 1 root root 4096 Oct 19 00:12 num

/sys/kernel/

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Laurentiu Gosu
Yes, i did reformat it(even more than once i think, last week). This is 
a pre-production system and i'm trying various options before moving 
into real life.



On 10/19/2011 01:19, Sunil Mushran wrote:

Did you reformat the volume recently? or, when did you format last?

On 10/18/2011 03:13 PM, Laurentiu Gosu wrote:

well..this is weird
ls /sys/kernel/config/cluster/CLUSTER/heartbeat/
*918673F06F8F4ED188DDCE14F39945F6*  dead_threshold

looks like we have different UUIDs. Where is this coming from??

ocfs2_hb_ctl -I -u 918673F06F8F4ED188DDCE14F39945F6
918673F06F8F4ED188DDCE14F39945F6: 1 refs


On 10/19/2011 01:04, Sunil Mushran wrote:

Let's do it by hand.
rm -rf 
/sys/kernel/config/cluster/.../heartbeat/*0C4AB55FE9314FA5A9F81652FDB9B22D 
*


On 10/18/2011 02:52 PM, Laurentiu Gosu wrote:

 ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat

No improvment :(


On 10/19/2011 00:50, Sunil Mushran wrote:

See if this cleans it up.
ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D

On 10/18/2011 02:44 PM, Laurentiu Gosu wrote:

ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
0C4AB55FE9314FA5A9F81652FDB9B22D: 0 refs


On 10/19/2011 00:43, Sunil Mushran wrote:

ocfs2_hb_ctl -l -u 0C4AB55FE9314FA5A9F81652FDB9B22D

On 10/18/2011 02:40 PM, Laurentiu Gosu wrote:

mounted.ocfs2 -d
DeviceFS Stack  
UUID  Label
/dev/mapper/volgr1-lvol0  ocfs2  o2cb   
0C4AB55FE9314FA5A9F81652FDB9B22D  ocfs2


mounted.ocfs2 -f
DeviceFS Nodes
/dev/mapper/volgr1-lvol0  ocfs2  ro02xsrv001

ro02xsrv001 = the other node in the cluster.

By the way, there is no /dev/md-2
 ls /dev/dm-*
/dev/dm-0  /dev/dm-1


On 10/19/2011 00:37, Sunil Mushran wrote:

So it is not mounted. But we still have a hb thread because
hb could not be stopped during umount. The reason for that
could be the same that causes ocfs2_hb_ctl to fail.

Do:
mounted.ocfs2 -d

On 10/18/2011 02:32 PM, Laurentiu Gosu wrote:

ls -lR /sys/kernel/debug/ocfs2
/sys/kernel/debug/ocfs2:
total 0

ls -lR /sys/kernel/debug/o2dlm
/sys/kernel/debug/o2dlm:
total 0

ocfs2_hb_ctl -I -d /dev/dm-2
ocfs2_hb_ctl: Device name specified was not found while 
reading uuid


There is no /dev/dm-2 mounted.


On 10/19/2011 00:27, Sunil Mushran wrote:

mount -t debugfs debugfs /sys/kernel/debug

Then list that dir.

Also, do:
ocfs2_hb_ctl -l -d /dev/dm-2

Be careful before killing. We want to be sure that dev is 
not mounted.


On 10/18/2011 02:23 PM, Laurentiu Gosu wrote:

Again   the outputs:
 cat 
/sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev

dm-2
--->here should be volgr1-lvol0 i guess?

ls -lR /sys/kernel/debug/ocfs2
ls: /sys/kernel/debug/ocfs2: No such file or directory

ls -lR /sys/kernel/debug/o2dlm
ls: /sys/kernel/debug/o2dlm: No such file or directory

I think i have to enable debug first somehow..?

Laurentiu.

On 10/19/2011 00:17, Sunil Mushran wrote:

What does this return?
cat 
/sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev


Also, do:
ls -lR /sys/kernel/debug/ocfs2
ls -lR /sys/kernel/debug/o2dlm

On 10/18/2011 02:14 PM, Laurentiu Gosu wrote:

Here is the output:

ls -lR /sys/kernel/config/cluster
/sys/kernel/config/cluster:
total 0
drwxr-xr-x 4 root root 0 Oct 19 00:12 CLUSTER

/sys/kernel/config/cluster/CLUSTER:
total 0
-rw-r--r-- 1 root root 4096 Oct 19 00:12 fence_method
drwxr-xr-x 3 root root0 Oct 19 00:12 heartbeat
-rw-r--r-- 1 root root 4096 Oct 19 00:12 idle_timeout_ms
-rw-r--r-- 1 root root 4096 Oct 19 00:12 keepalive_delay_ms
drwxr-xr-x 4 root root0 Oct 11 20:23 node
-rw-r--r-- 1 root root 4096 Oct 19 00:12 reconnect_delay_ms

/sys/kernel/config/cluster/CLUSTER/heartbeat:
total 0
drwxr-xr-x 2 root root0 Oct 19 00:12 
918673F06F8F4ED188DDCE14F39945F6

-rw-r--r-- 1 root root 4096 Oct 19 00:12 dead_threshold

/sys/kernel/config/cluster/CLUSTER/heartbeat/*918673F06F8F4ED188DDCE14F39945F6*: 


total 0
-rw-r--r-- 1 root root 4096 Oct 19 00:12 block_bytes
-rw-r--r-- 1 root root 4096 Oct 19 00:12 blocks
-rw-r--r-- 1 root root 4096 Oct 19 00:12 dev
-r--r--r-- 1 root root 4096 Oct 19 00:12 pid
-rw-r--r-- 1 root root 4096 Oct 19 00:12 start_block

/sys/kernel/config/cluster/CLUSTER/node:
total 0
drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv001
drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv002

/sys/kernel/config/cluster/CLUSTER/node/ro02xsrv001:
total 0
-rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
-rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
-rw-r--r-- 1 root root 4096 Oct 19 00:12 local
-rw-r--r-- 1 root root 4096 Oct 19 00:12 num

/sys/kernel/config/cluster/CLUSTER/node/ro02xsrv002:
total 0
-rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
-rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
-rw-r--r-- 1 root root 4096 Oct 19 00:12 local
-rw-r--r-- 1 root root 4096 Oct 19 00:12 num




On 10/19/2011 00:12, Sunil Mushran wrote:

ls -lR /sys/kernel/conf

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Sunil Mushran

Did you reformat the volume recently? or, when did you format last?

On 10/18/2011 03:13 PM, Laurentiu Gosu wrote:

well..this is weird
ls /sys/kernel/config/cluster/CLUSTER/heartbeat/
*918673F06F8F4ED188DDCE14F39945F6*  dead_threshold

looks like we have different UUIDs. Where is this coming from??

ocfs2_hb_ctl -I -u 918673F06F8F4ED188DDCE14F39945F6
918673F06F8F4ED188DDCE14F39945F6: 1 refs


On 10/19/2011 01:04, Sunil Mushran wrote:

Let's do it by hand.
rm -rf 
/sys/kernel/config/cluster/.../heartbeat/*0C4AB55FE9314FA5A9F81652FDB9B22D *

On 10/18/2011 02:52 PM, Laurentiu Gosu wrote:

 ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat

No improvment :(


On 10/19/2011 00:50, Sunil Mushran wrote:

See if this cleans it up.
ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D

On 10/18/2011 02:44 PM, Laurentiu Gosu wrote:

ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
0C4AB55FE9314FA5A9F81652FDB9B22D: 0 refs


On 10/19/2011 00:43, Sunil Mushran wrote:

ocfs2_hb_ctl -l -u 0C4AB55FE9314FA5A9F81652FDB9B22D

On 10/18/2011 02:40 PM, Laurentiu Gosu wrote:

mounted.ocfs2 -d
DeviceFS Stack  UUID  Label
/dev/mapper/volgr1-lvol0  ocfs2  o2cb   0C4AB55FE9314FA5A9F81652FDB9B22D  ocfs2

mounted.ocfs2 -f
DeviceFS Nodes
/dev/mapper/volgr1-lvol0  ocfs2  ro02xsrv001

ro02xsrv001 = the other node in the cluster.

By the way, there is no /dev/md-2
 ls /dev/dm-*
/dev/dm-0  /dev/dm-1


On 10/19/2011 00:37, Sunil Mushran wrote:

So it is not mounted. But we still have a hb thread because
hb could not be stopped during umount. The reason for that
could be the same that causes ocfs2_hb_ctl to fail.

Do:
mounted.ocfs2 -d

On 10/18/2011 02:32 PM, Laurentiu Gosu wrote:

ls -lR /sys/kernel/debug/ocfs2
/sys/kernel/debug/ocfs2:
total 0

ls -lR /sys/kernel/debug/o2dlm
/sys/kernel/debug/o2dlm:
total 0

ocfs2_hb_ctl -I -d /dev/dm-2
ocfs2_hb_ctl: Device name specified was not found while reading uuid

There is no /dev/dm-2 mounted.


On 10/19/2011 00:27, Sunil Mushran wrote:

mount -t debugfs debugfs /sys/kernel/debug

Then list that dir.

Also, do:
ocfs2_hb_ctl -l -d /dev/dm-2

Be careful before killing. We want to be sure that dev is not mounted.

On 10/18/2011 02:23 PM, Laurentiu Gosu wrote:

Again   the outputs:
 cat 
/sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
dm-2
--->here should be volgr1-lvol0 i guess?

ls -lR /sys/kernel/debug/ocfs2
ls: /sys/kernel/debug/ocfs2: No such file or directory

ls -lR /sys/kernel/debug/o2dlm
ls: /sys/kernel/debug/o2dlm: No such file or directory

I think i have to enable debug first somehow..?

Laurentiu.

On 10/19/2011 00:17, Sunil Mushran wrote:

What does this return?
cat 
/sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev

Also, do:
ls -lR /sys/kernel/debug/ocfs2
ls -lR /sys/kernel/debug/o2dlm

On 10/18/2011 02:14 PM, Laurentiu Gosu wrote:

Here is the output:

ls -lR /sys/kernel/config/cluster
/sys/kernel/config/cluster:
total 0
drwxr-xr-x 4 root root 0 Oct 19 00:12 CLUSTER

/sys/kernel/config/cluster/CLUSTER:
total 0
-rw-r--r-- 1 root root 4096 Oct 19 00:12 fence_method
drwxr-xr-x 3 root root0 Oct 19 00:12 heartbeat
-rw-r--r-- 1 root root 4096 Oct 19 00:12 idle_timeout_ms
-rw-r--r-- 1 root root 4096 Oct 19 00:12 keepalive_delay_ms
drwxr-xr-x 4 root root0 Oct 11 20:23 node
-rw-r--r-- 1 root root 4096 Oct 19 00:12 reconnect_delay_ms

/sys/kernel/config/cluster/CLUSTER/heartbeat:
total 0
drwxr-xr-x 2 root root0 Oct 19 00:12 918673F06F8F4ED188DDCE14F39945F6
-rw-r--r-- 1 root root 4096 Oct 19 00:12 dead_threshold

/sys/kernel/config/cluster/CLUSTER/heartbeat/*918673F06F8F4ED188DDCE14F39945F6*:
total 0
-rw-r--r-- 1 root root 4096 Oct 19 00:12 block_bytes
-rw-r--r-- 1 root root 4096 Oct 19 00:12 blocks
-rw-r--r-- 1 root root 4096 Oct 19 00:12 dev
-r--r--r-- 1 root root 4096 Oct 19 00:12 pid
-rw-r--r-- 1 root root 4096 Oct 19 00:12 start_block

/sys/kernel/config/cluster/CLUSTER/node:
total 0
drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv001
drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv002

/sys/kernel/config/cluster/CLUSTER/node/ro02xsrv001:
total 0
-rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
-rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
-rw-r--r-- 1 root root 4096 Oct 19 00:12 local
-rw-r--r-- 1 root root 4096 Oct 19 00:12 num

/sys/kernel/config/cluster/CLUSTER/node/ro02xsrv002:
total 0
-rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
-rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
-rw-r--r-- 1 root root 4096 Oct 19 00:12 local
-rw-r--r-- 1 root root 4096 Oct 19 00:12 num




On 10/19/2011 00:12, Sunil Mushran wrote:

ls -lR /sys/kernel/config/cluster

What does this return?

On 10/18/2011 02:05 PM, Laurentiu Gosu wrote:

Hi,
I have a 2 nodes ocfs2 cluster running UEK 2.6.32-100.0.19.el5,
ocfs2console-1.6.3-2.el5, ocfs2-tools-1.6.3-2.el5.
My problem is that

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Laurentiu Gosu

well..this is weird
ls /sys/kernel/config/cluster/CLUSTER/heartbeat/
*918673F06F8F4ED188DDCE14F39945F6*  dead_threshold

looks like we have different UUIDs. Where is this coming from??

ocfs2_hb_ctl -I -u 918673F06F8F4ED188DDCE14F39945F6
918673F06F8F4ED188DDCE14F39945F6: 1 refs


On 10/19/2011 01:04, Sunil Mushran wrote:

Let's do it by hand.
rm -rf 
/sys/kernel/config/cluster/.../heartbeat/*0C4AB55FE9314FA5A9F81652FDB9B22D 
*


On 10/18/2011 02:52 PM, Laurentiu Gosu wrote:

 ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat

No improvment :(


On 10/19/2011 00:50, Sunil Mushran wrote:

See if this cleans it up.
ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D

On 10/18/2011 02:44 PM, Laurentiu Gosu wrote:

ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
0C4AB55FE9314FA5A9F81652FDB9B22D: 0 refs


On 10/19/2011 00:43, Sunil Mushran wrote:

ocfs2_hb_ctl -l -u 0C4AB55FE9314FA5A9F81652FDB9B22D

On 10/18/2011 02:40 PM, Laurentiu Gosu wrote:

mounted.ocfs2 -d
DeviceFS Stack  
UUID  Label
/dev/mapper/volgr1-lvol0  ocfs2  o2cb   
0C4AB55FE9314FA5A9F81652FDB9B22D  ocfs2


mounted.ocfs2 -f
DeviceFS Nodes
/dev/mapper/volgr1-lvol0  ocfs2  ro02xsrv001

ro02xsrv001 = the other node in the cluster.

By the way, there is no /dev/md-2
 ls /dev/dm-*
/dev/dm-0  /dev/dm-1


On 10/19/2011 00:37, Sunil Mushran wrote:

So it is not mounted. But we still have a hb thread because
hb could not be stopped during umount. The reason for that
could be the same that causes ocfs2_hb_ctl to fail.

Do:
mounted.ocfs2 -d

On 10/18/2011 02:32 PM, Laurentiu Gosu wrote:

ls -lR /sys/kernel/debug/ocfs2
/sys/kernel/debug/ocfs2:
total 0

ls -lR /sys/kernel/debug/o2dlm
/sys/kernel/debug/o2dlm:
total 0

ocfs2_hb_ctl -I -d /dev/dm-2
ocfs2_hb_ctl: Device name specified was not found while reading 
uuid


There is no /dev/dm-2 mounted.


On 10/19/2011 00:27, Sunil Mushran wrote:

mount -t debugfs debugfs /sys/kernel/debug

Then list that dir.

Also, do:
ocfs2_hb_ctl -l -d /dev/dm-2

Be careful before killing. We want to be sure that dev is not 
mounted.


On 10/18/2011 02:23 PM, Laurentiu Gosu wrote:

Again   the outputs:
 cat 
/sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev

dm-2
--->here should be volgr1-lvol0 i guess?

ls -lR /sys/kernel/debug/ocfs2
ls: /sys/kernel/debug/ocfs2: No such file or directory

ls -lR /sys/kernel/debug/o2dlm
ls: /sys/kernel/debug/o2dlm: No such file or directory

I think i have to enable debug first somehow..?

Laurentiu.

On 10/19/2011 00:17, Sunil Mushran wrote:

What does this return?
cat 
/sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev


Also, do:
ls -lR /sys/kernel/debug/ocfs2
ls -lR /sys/kernel/debug/o2dlm

On 10/18/2011 02:14 PM, Laurentiu Gosu wrote:

Here is the output:

ls -lR /sys/kernel/config/cluster
/sys/kernel/config/cluster:
total 0
drwxr-xr-x 4 root root 0 Oct 19 00:12 CLUSTER

/sys/kernel/config/cluster/CLUSTER:
total 0
-rw-r--r-- 1 root root 4096 Oct 19 00:12 fence_method
drwxr-xr-x 3 root root0 Oct 19 00:12 heartbeat
-rw-r--r-- 1 root root 4096 Oct 19 00:12 idle_timeout_ms
-rw-r--r-- 1 root root 4096 Oct 19 00:12 keepalive_delay_ms
drwxr-xr-x 4 root root0 Oct 11 20:23 node
-rw-r--r-- 1 root root 4096 Oct 19 00:12 reconnect_delay_ms

/sys/kernel/config/cluster/CLUSTER/heartbeat:
total 0
drwxr-xr-x 2 root root0 Oct 19 00:12 
918673F06F8F4ED188DDCE14F39945F6

-rw-r--r-- 1 root root 4096 Oct 19 00:12 dead_threshold

/sys/kernel/config/cluster/CLUSTER/heartbeat/*918673F06F8F4ED188DDCE14F39945F6*: 


total 0
-rw-r--r-- 1 root root 4096 Oct 19 00:12 block_bytes
-rw-r--r-- 1 root root 4096 Oct 19 00:12 blocks
-rw-r--r-- 1 root root 4096 Oct 19 00:12 dev
-r--r--r-- 1 root root 4096 Oct 19 00:12 pid
-rw-r--r-- 1 root root 4096 Oct 19 00:12 start_block

/sys/kernel/config/cluster/CLUSTER/node:
total 0
drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv001
drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv002

/sys/kernel/config/cluster/CLUSTER/node/ro02xsrv001:
total 0
-rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
-rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
-rw-r--r-- 1 root root 4096 Oct 19 00:12 local
-rw-r--r-- 1 root root 4096 Oct 19 00:12 num

/sys/kernel/config/cluster/CLUSTER/node/ro02xsrv002:
total 0
-rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
-rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
-rw-r--r-- 1 root root 4096 Oct 19 00:12 local
-rw-r--r-- 1 root root 4096 Oct 19 00:12 num




On 10/19/2011 00:12, Sunil Mushran wrote:

ls -lR /sys/kernel/config/cluster

What does this return?

On 10/18/2011 02:05 PM, Laurentiu Gosu wrote:

Hi,
I have a 2 nodes ocfs2 cluster running UEK 
2.6.32-100.0.19.el5,

ocfs2console-1.6.3-2.el5, ocfs2-tools-1.6.3-2.el5.
My problem is that all the time when i try to run 
/etc/init.d/o2cb stop

it fails with this error:
  Stopping O

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Sunil Mushran
Let's do it by hand.
rm -rf /sys/kernel/config/cluster/.../heartbeat/0C4AB55FE9314FA5A9F81652FDB9B22D

On 10/18/2011 02:52 PM, Laurentiu Gosu wrote:
>  ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
> ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat
>
> No improvment :(
>
>
> On 10/19/2011 00:50, Sunil Mushran wrote:
>> See if this cleans it up.
>> ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>>
>> On 10/18/2011 02:44 PM, Laurentiu Gosu wrote:
>>> ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>>> 0C4AB55FE9314FA5A9F81652FDB9B22D: 0 refs
>>>
>>>
>>> On 10/19/2011 00:43, Sunil Mushran wrote:
 ocfs2_hb_ctl -l -u 0C4AB55FE9314FA5A9F81652FDB9B22D

 On 10/18/2011 02:40 PM, Laurentiu Gosu wrote:
> mounted.ocfs2 -d
> DeviceFS Stack  UUID  
> Label
> /dev/mapper/volgr1-lvol0  ocfs2  o2cb   0C4AB55FE9314FA5A9F81652FDB9B22D  
> ocfs2
>
> mounted.ocfs2 -f
> DeviceFS Nodes
> /dev/mapper/volgr1-lvol0  ocfs2  ro02xsrv001
>
> ro02xsrv001 = the other node in the cluster.
>
> By the way, there is no /dev/md-2
>  ls /dev/dm-*
> /dev/dm-0  /dev/dm-1
>
>
> On 10/19/2011 00:37, Sunil Mushran wrote:
>> So it is not mounted. But we still have a hb thread because
>> hb could not be stopped during umount. The reason for that
>> could be the same that causes ocfs2_hb_ctl to fail.
>>
>> Do:
>> mounted.ocfs2 -d
>>
>> On 10/18/2011 02:32 PM, Laurentiu Gosu wrote:
>>> ls -lR /sys/kernel/debug/ocfs2
>>> /sys/kernel/debug/ocfs2:
>>> total 0
>>>
>>> ls -lR /sys/kernel/debug/o2dlm
>>> /sys/kernel/debug/o2dlm:
>>> total 0
>>>
>>> ocfs2_hb_ctl -I -d /dev/dm-2
>>> ocfs2_hb_ctl: Device name specified was not found while reading uuid
>>>
>>> There is no /dev/dm-2 mounted.
>>>
>>>
>>> On 10/19/2011 00:27, Sunil Mushran wrote:
 mount -t debugfs debugfs /sys/kernel/debug

 Then list that dir.

 Also, do:
 ocfs2_hb_ctl -l -d /dev/dm-2

 Be careful before killing. We want to be sure that dev is not mounted.

 On 10/18/2011 02:23 PM, Laurentiu Gosu wrote:
> Again   the outputs:
>  cat 
> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
> dm-2
> --->here should be volgr1-lvol0 i guess?
>
> ls -lR /sys/kernel/debug/ocfs2
> ls: /sys/kernel/debug/ocfs2: No such file or directory
>
> ls -lR /sys/kernel/debug/o2dlm
> ls: /sys/kernel/debug/o2dlm: No such file or directory
>
> I think i have to enable debug first somehow..?
>
> Laurentiu.
>
> On 10/19/2011 00:17, Sunil Mushran wrote:
>> What does this return?
>> cat 
>> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
>>
>> Also, do:
>> ls -lR /sys/kernel/debug/ocfs2
>> ls -lR /sys/kernel/debug/o2dlm
>>
>> On 10/18/2011 02:14 PM, Laurentiu Gosu wrote:
>>> Here is the output:
>>>
>>> ls -lR /sys/kernel/config/cluster
>>> /sys/kernel/config/cluster:
>>> total 0
>>> drwxr-xr-x 4 root root 0 Oct 19 00:12 CLUSTER
>>>
>>> /sys/kernel/config/cluster/CLUSTER:
>>> total 0
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 fence_method
>>> drwxr-xr-x 3 root root0 Oct 19 00:12 heartbeat
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 idle_timeout_ms
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 keepalive_delay_ms
>>> drwxr-xr-x 4 root root0 Oct 11 20:23 node
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 reconnect_delay_ms
>>>
>>> /sys/kernel/config/cluster/CLUSTER/heartbeat:
>>> total 0
>>> drwxr-xr-x 2 root root0 Oct 19 00:12 
>>> 918673F06F8F4ED188DDCE14F39945F6
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 dead_threshold
>>>
>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6:
>>> total 0
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 block_bytes
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 blocks
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 dev
>>> -r--r--r-- 1 root root 4096 Oct 19 00:12 pid
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 start_block
>>>
>>> /sys/kernel/config/cluster/CLUSTER/node:
>>> total 0
>>> drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv001
>>> drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv002
>>>
>>> /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv001:
>>> total 0
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
>>> -rw-r--r-- 

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Sunil Mushran
See if this cleans it up.
ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D

On 10/18/2011 02:44 PM, Laurentiu Gosu wrote:
> ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
> 0C4AB55FE9314FA5A9F81652FDB9B22D: 0 refs
>
>
> On 10/19/2011 00:43, Sunil Mushran wrote:
>> ocfs2_hb_ctl -l -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>>
>> On 10/18/2011 02:40 PM, Laurentiu Gosu wrote:
>>> mounted.ocfs2 -d
>>> DeviceFS Stack  UUID  Label
>>> /dev/mapper/volgr1-lvol0  ocfs2  o2cb   0C4AB55FE9314FA5A9F81652FDB9B22D  
>>> ocfs2
>>>
>>> mounted.ocfs2 -f
>>> DeviceFS Nodes
>>> /dev/mapper/volgr1-lvol0  ocfs2  ro02xsrv001
>>>
>>> ro02xsrv001 = the other node in the cluster.
>>>
>>> By the way, there is no /dev/md-2
>>>  ls /dev/dm-*
>>> /dev/dm-0  /dev/dm-1
>>>
>>>
>>> On 10/19/2011 00:37, Sunil Mushran wrote:
 So it is not mounted. But we still have a hb thread because
 hb could not be stopped during umount. The reason for that
 could be the same that causes ocfs2_hb_ctl to fail.

 Do:
 mounted.ocfs2 -d

 On 10/18/2011 02:32 PM, Laurentiu Gosu wrote:
> ls -lR /sys/kernel/debug/ocfs2
> /sys/kernel/debug/ocfs2:
> total 0
>
> ls -lR /sys/kernel/debug/o2dlm
> /sys/kernel/debug/o2dlm:
> total 0
>
> ocfs2_hb_ctl -I -d /dev/dm-2
> ocfs2_hb_ctl: Device name specified was not found while reading uuid
>
> There is no /dev/dm-2 mounted.
>
>
> On 10/19/2011 00:27, Sunil Mushran wrote:
>> mount -t debugfs debugfs /sys/kernel/debug
>>
>> Then list that dir.
>>
>> Also, do:
>> ocfs2_hb_ctl -l -d /dev/dm-2
>>
>> Be careful before killing. We want to be sure that dev is not mounted.
>>
>> On 10/18/2011 02:23 PM, Laurentiu Gosu wrote:
>>> Again   the outputs:
>>>  cat 
>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
>>> dm-2
>>> --->here should be volgr1-lvol0 i guess?
>>>
>>> ls -lR /sys/kernel/debug/ocfs2
>>> ls: /sys/kernel/debug/ocfs2: No such file or directory
>>>
>>> ls -lR /sys/kernel/debug/o2dlm
>>> ls: /sys/kernel/debug/o2dlm: No such file or directory
>>>
>>> I think i have to enable debug first somehow..?
>>>
>>> Laurentiu.
>>>
>>> On 10/19/2011 00:17, Sunil Mushran wrote:
 What does this return?
 cat 
 /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev

 Also, do:
 ls -lR /sys/kernel/debug/ocfs2
 ls -lR /sys/kernel/debug/o2dlm

 On 10/18/2011 02:14 PM, Laurentiu Gosu wrote:
> Here is the output:
>
> ls -lR /sys/kernel/config/cluster
> /sys/kernel/config/cluster:
> total 0
> drwxr-xr-x 4 root root 0 Oct 19 00:12 CLUSTER
>
> /sys/kernel/config/cluster/CLUSTER:
> total 0
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 fence_method
> drwxr-xr-x 3 root root0 Oct 19 00:12 heartbeat
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 idle_timeout_ms
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 keepalive_delay_ms
> drwxr-xr-x 4 root root0 Oct 11 20:23 node
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 reconnect_delay_ms
>
> /sys/kernel/config/cluster/CLUSTER/heartbeat:
> total 0
> drwxr-xr-x 2 root root0 Oct 19 00:12 
> 918673F06F8F4ED188DDCE14F39945F6
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 dead_threshold
>
> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6:
> total 0
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 block_bytes
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 blocks
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 dev
> -r--r--r-- 1 root root 4096 Oct 19 00:12 pid
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 start_block
>
> /sys/kernel/config/cluster/CLUSTER/node:
> total 0
> drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv001
> drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv002
>
> /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv001:
> total 0
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 local
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 num
>
> /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv002:
> total 0
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 local
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 num
>
>
>
>
> On 10/19/2011 00:12, Sunil Mushran wrote:
>> ls -lR /sys/kernel/config/

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Laurentiu Gosu
  ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat

No improvment :(


On 10/19/2011 00:50, Sunil Mushran wrote:
> See if this cleans it up.
> ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>
> On 10/18/2011 02:44 PM, Laurentiu Gosu wrote:
>> ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>> 0C4AB55FE9314FA5A9F81652FDB9B22D: 0 refs
>>
>>
>> On 10/19/2011 00:43, Sunil Mushran wrote:
>>> ocfs2_hb_ctl -l -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>>>
>>> On 10/18/2011 02:40 PM, Laurentiu Gosu wrote:
 mounted.ocfs2 -d
 DeviceFS Stack  
 UUID  Label
 /dev/mapper/volgr1-lvol0  ocfs2  o2cb   
 0C4AB55FE9314FA5A9F81652FDB9B22D  ocfs2

 mounted.ocfs2 -f
 DeviceFS Nodes
 /dev/mapper/volgr1-lvol0  ocfs2  ro02xsrv001

 ro02xsrv001 = the other node in the cluster.

 By the way, there is no /dev/md-2
  ls /dev/dm-*
 /dev/dm-0  /dev/dm-1


 On 10/19/2011 00:37, Sunil Mushran wrote:
> So it is not mounted. But we still have a hb thread because
> hb could not be stopped during umount. The reason for that
> could be the same that causes ocfs2_hb_ctl to fail.
>
> Do:
> mounted.ocfs2 -d
>
> On 10/18/2011 02:32 PM, Laurentiu Gosu wrote:
>> ls -lR /sys/kernel/debug/ocfs2
>> /sys/kernel/debug/ocfs2:
>> total 0
>>
>> ls -lR /sys/kernel/debug/o2dlm
>> /sys/kernel/debug/o2dlm:
>> total 0
>>
>> ocfs2_hb_ctl -I -d /dev/dm-2
>> ocfs2_hb_ctl: Device name specified was not found while reading uuid
>>
>> There is no /dev/dm-2 mounted.
>>
>>
>> On 10/19/2011 00:27, Sunil Mushran wrote:
>>> mount -t debugfs debugfs /sys/kernel/debug
>>>
>>> Then list that dir.
>>>
>>> Also, do:
>>> ocfs2_hb_ctl -l -d /dev/dm-2
>>>
>>> Be careful before killing. We want to be sure that dev is not 
>>> mounted.
>>>
>>> On 10/18/2011 02:23 PM, Laurentiu Gosu wrote:
 Again   the outputs:
  cat 
 /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
 dm-2
 --->here should be volgr1-lvol0 i guess?

 ls -lR /sys/kernel/debug/ocfs2
 ls: /sys/kernel/debug/ocfs2: No such file or directory

 ls -lR /sys/kernel/debug/o2dlm
 ls: /sys/kernel/debug/o2dlm: No such file or directory

 I think i have to enable debug first somehow..?

 Laurentiu.

 On 10/19/2011 00:17, Sunil Mushran wrote:
> What does this return?
> cat 
> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
>
> Also, do:
> ls -lR /sys/kernel/debug/ocfs2
> ls -lR /sys/kernel/debug/o2dlm
>
> On 10/18/2011 02:14 PM, Laurentiu Gosu wrote:
>> Here is the output:
>>
>> ls -lR /sys/kernel/config/cluster
>> /sys/kernel/config/cluster:
>> total 0
>> drwxr-xr-x 4 root root 0 Oct 19 00:12 CLUSTER
>>
>> /sys/kernel/config/cluster/CLUSTER:
>> total 0
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 fence_method
>> drwxr-xr-x 3 root root0 Oct 19 00:12 heartbeat
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 idle_timeout_ms
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 keepalive_delay_ms
>> drwxr-xr-x 4 root root0 Oct 11 20:23 node
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 reconnect_delay_ms
>>
>> /sys/kernel/config/cluster/CLUSTER/heartbeat:
>> total 0
>> drwxr-xr-x 2 root root0 Oct 19 00:12 
>> 918673F06F8F4ED188DDCE14F39945F6
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 dead_threshold
>>
>> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6:
>>  
>>
>> total 0
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 block_bytes
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 blocks
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 dev
>> -r--r--r-- 1 root root 4096 Oct 19 00:12 pid
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 start_block
>>
>> /sys/kernel/config/cluster/CLUSTER/node:
>> total 0
>> drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv001
>> drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv002
>>
>> /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv001:
>> total 0
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 local
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 num
>>
>> /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv002:
>> total 0
>>

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Laurentiu Gosu
ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
0C4AB55FE9314FA5A9F81652FDB9B22D: 0 refs


On 10/19/2011 00:43, Sunil Mushran wrote:
> ocfs2_hb_ctl -l -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>
> On 10/18/2011 02:40 PM, Laurentiu Gosu wrote:
>> mounted.ocfs2 -d
>> DeviceFS Stack  UUID  
>> Label
>> /dev/mapper/volgr1-lvol0  ocfs2  o2cb   
>> 0C4AB55FE9314FA5A9F81652FDB9B22D  ocfs2
>>
>> mounted.ocfs2 -f
>> DeviceFS Nodes
>> /dev/mapper/volgr1-lvol0  ocfs2  ro02xsrv001
>>
>> ro02xsrv001 = the other node in the cluster.
>>
>> By the way, there is no /dev/md-2
>>  ls /dev/dm-*
>> /dev/dm-0  /dev/dm-1
>>
>>
>> On 10/19/2011 00:37, Sunil Mushran wrote:
>>> So it is not mounted. But we still have a hb thread because
>>> hb could not be stopped during umount. The reason for that
>>> could be the same that causes ocfs2_hb_ctl to fail.
>>>
>>> Do:
>>> mounted.ocfs2 -d
>>>
>>> On 10/18/2011 02:32 PM, Laurentiu Gosu wrote:
 ls -lR /sys/kernel/debug/ocfs2
 /sys/kernel/debug/ocfs2:
 total 0

 ls -lR /sys/kernel/debug/o2dlm
 /sys/kernel/debug/o2dlm:
 total 0

 ocfs2_hb_ctl -I -d /dev/dm-2
 ocfs2_hb_ctl: Device name specified was not found while reading uuid

 There is no /dev/dm-2 mounted.


 On 10/19/2011 00:27, Sunil Mushran wrote:
> mount -t debugfs debugfs /sys/kernel/debug
>
> Then list that dir.
>
> Also, do:
> ocfs2_hb_ctl -l -d /dev/dm-2
>
> Be careful before killing. We want to be sure that dev is not 
> mounted.
>
> On 10/18/2011 02:23 PM, Laurentiu Gosu wrote:
>> Again   the outputs:
>>  cat 
>> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
>> dm-2
>> --->here should be volgr1-lvol0 i guess?
>>
>> ls -lR /sys/kernel/debug/ocfs2
>> ls: /sys/kernel/debug/ocfs2: No such file or directory
>>
>> ls -lR /sys/kernel/debug/o2dlm
>> ls: /sys/kernel/debug/o2dlm: No such file or directory
>>
>> I think i have to enable debug first somehow..?
>>
>> Laurentiu.
>>
>> On 10/19/2011 00:17, Sunil Mushran wrote:
>>> What does this return?
>>> cat 
>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
>>>
>>> Also, do:
>>> ls -lR /sys/kernel/debug/ocfs2
>>> ls -lR /sys/kernel/debug/o2dlm
>>>
>>> On 10/18/2011 02:14 PM, Laurentiu Gosu wrote:
 Here is the output:

 ls -lR /sys/kernel/config/cluster
 /sys/kernel/config/cluster:
 total 0
 drwxr-xr-x 4 root root 0 Oct 19 00:12 CLUSTER

 /sys/kernel/config/cluster/CLUSTER:
 total 0
 -rw-r--r-- 1 root root 4096 Oct 19 00:12 fence_method
 drwxr-xr-x 3 root root0 Oct 19 00:12 heartbeat
 -rw-r--r-- 1 root root 4096 Oct 19 00:12 idle_timeout_ms
 -rw-r--r-- 1 root root 4096 Oct 19 00:12 keepalive_delay_ms
 drwxr-xr-x 4 root root0 Oct 11 20:23 node
 -rw-r--r-- 1 root root 4096 Oct 19 00:12 reconnect_delay_ms

 /sys/kernel/config/cluster/CLUSTER/heartbeat:
 total 0
 drwxr-xr-x 2 root root0 Oct 19 00:12 
 918673F06F8F4ED188DDCE14F39945F6
 -rw-r--r-- 1 root root 4096 Oct 19 00:12 dead_threshold

 /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6:
  

 total 0
 -rw-r--r-- 1 root root 4096 Oct 19 00:12 block_bytes
 -rw-r--r-- 1 root root 4096 Oct 19 00:12 blocks
 -rw-r--r-- 1 root root 4096 Oct 19 00:12 dev
 -r--r--r-- 1 root root 4096 Oct 19 00:12 pid
 -rw-r--r-- 1 root root 4096 Oct 19 00:12 start_block

 /sys/kernel/config/cluster/CLUSTER/node:
 total 0
 drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv001
 drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv002

 /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv001:
 total 0
 -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
 -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
 -rw-r--r-- 1 root root 4096 Oct 19 00:12 local
 -rw-r--r-- 1 root root 4096 Oct 19 00:12 num

 /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv002:
 total 0
 -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
 -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
 -rw-r--r-- 1 root root 4096 Oct 19 00:12 local
 -rw-r--r-- 1 root root 4096 Oct 19 00:12 num




 On 10/19/2011 00:12, Sunil Mushran wrote:
> ls -lR /sys/kernel/config/cluster
>
> What does this return?
>
> On 10/18/2011 02:05 PM, Laurentiu Gosu wrote:
>> Hi,
>> I have a 2 nodes ocfs2 cluster running UEK 2.6.32-100.0.19.el5,
>> ocfs2console-1.6.

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Sunil Mushran
ocfs2_hb_ctl -l -u 0C4AB55FE9314FA5A9F81652FDB9B22D

On 10/18/2011 02:40 PM, Laurentiu Gosu wrote:
> mounted.ocfs2 -d
> DeviceFS Stack  UUID  Label
> /dev/mapper/volgr1-lvol0  ocfs2  o2cb   0C4AB55FE9314FA5A9F81652FDB9B22D  
> ocfs2
>
> mounted.ocfs2 -f
> DeviceFS Nodes
> /dev/mapper/volgr1-lvol0  ocfs2  ro02xsrv001
>
> ro02xsrv001 = the other node in the cluster.
>
> By the way, there is no /dev/md-2
>  ls /dev/dm-*
> /dev/dm-0  /dev/dm-1
>
>
> On 10/19/2011 00:37, Sunil Mushran wrote:
>> So it is not mounted. But we still have a hb thread because
>> hb could not be stopped during umount. The reason for that
>> could be the same that causes ocfs2_hb_ctl to fail.
>>
>> Do:
>> mounted.ocfs2 -d
>>
>> On 10/18/2011 02:32 PM, Laurentiu Gosu wrote:
>>> ls -lR /sys/kernel/debug/ocfs2
>>> /sys/kernel/debug/ocfs2:
>>> total 0
>>>
>>> ls -lR /sys/kernel/debug/o2dlm
>>> /sys/kernel/debug/o2dlm:
>>> total 0
>>>
>>> ocfs2_hb_ctl -I -d /dev/dm-2
>>> ocfs2_hb_ctl: Device name specified was not found while reading uuid
>>>
>>> There is no /dev/dm-2 mounted.
>>>
>>>
>>> On 10/19/2011 00:27, Sunil Mushran wrote:
 mount -t debugfs debugfs /sys/kernel/debug

 Then list that dir.

 Also, do:
 ocfs2_hb_ctl -l -d /dev/dm-2

 Be careful before killing. We want to be sure that dev is not mounted.

 On 10/18/2011 02:23 PM, Laurentiu Gosu wrote:
> Again   the outputs:
>  cat 
> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
> dm-2
> --->here should be volgr1-lvol0 i guess?
>
> ls -lR /sys/kernel/debug/ocfs2
> ls: /sys/kernel/debug/ocfs2: No such file or directory
>
> ls -lR /sys/kernel/debug/o2dlm
> ls: /sys/kernel/debug/o2dlm: No such file or directory
>
> I think i have to enable debug first somehow..?
>
> Laurentiu.
>
> On 10/19/2011 00:17, Sunil Mushran wrote:
>> What does this return?
>> cat 
>> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
>>
>> Also, do:
>> ls -lR /sys/kernel/debug/ocfs2
>> ls -lR /sys/kernel/debug/o2dlm
>>
>> On 10/18/2011 02:14 PM, Laurentiu Gosu wrote:
>>> Here is the output:
>>>
>>> ls -lR /sys/kernel/config/cluster
>>> /sys/kernel/config/cluster:
>>> total 0
>>> drwxr-xr-x 4 root root 0 Oct 19 00:12 CLUSTER
>>>
>>> /sys/kernel/config/cluster/CLUSTER:
>>> total 0
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 fence_method
>>> drwxr-xr-x 3 root root0 Oct 19 00:12 heartbeat
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 idle_timeout_ms
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 keepalive_delay_ms
>>> drwxr-xr-x 4 root root0 Oct 11 20:23 node
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 reconnect_delay_ms
>>>
>>> /sys/kernel/config/cluster/CLUSTER/heartbeat:
>>> total 0
>>> drwxr-xr-x 2 root root0 Oct 19 00:12 
>>> 918673F06F8F4ED188DDCE14F39945F6
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 dead_threshold
>>>
>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6:
>>> total 0
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 block_bytes
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 blocks
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 dev
>>> -r--r--r-- 1 root root 4096 Oct 19 00:12 pid
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 start_block
>>>
>>> /sys/kernel/config/cluster/CLUSTER/node:
>>> total 0
>>> drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv001
>>> drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv002
>>>
>>> /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv001:
>>> total 0
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 local
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 num
>>>
>>> /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv002:
>>> total 0
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 local
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 num
>>>
>>>
>>>
>>>
>>> On 10/19/2011 00:12, Sunil Mushran wrote:
 ls -lR /sys/kernel/config/cluster

 What does this return?

 On 10/18/2011 02:05 PM, Laurentiu Gosu wrote:
> Hi,
> I have a 2 nodes ocfs2 cluster running UEK 2.6.32-100.0.19.el5,
> ocfs2console-1.6.3-2.el5, ocfs2-tools-1.6.3-2.el5.
> My problem is that all the time when i try to run /etc/init.d/o2cb 
> stop
> it fails with this error:
>   Stopping O2CB cluster CLUSTER: Failed
>   Unable to stop cluster as heartbeat region still active
> There is no

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Laurentiu Gosu
mounted.ocfs2 -d
DeviceFS Stack  UUID  Label
/dev/mapper/volgr1-lvol0  ocfs2  o2cb   
0C4AB55FE9314FA5A9F81652FDB9B22D  ocfs2

mounted.ocfs2 -f
DeviceFS Nodes
/dev/mapper/volgr1-lvol0  ocfs2  ro02xsrv001

ro02xsrv001 = the other node in the cluster.

By the way, there is no /dev/md-2
  ls /dev/dm-*
/dev/dm-0  /dev/dm-1


On 10/19/2011 00:37, Sunil Mushran wrote:
> So it is not mounted. But we still have a hb thread because
> hb could not be stopped during umount. The reason for that
> could be the same that causes ocfs2_hb_ctl to fail.
>
> Do:
> mounted.ocfs2 -d
>
> On 10/18/2011 02:32 PM, Laurentiu Gosu wrote:
>> ls -lR /sys/kernel/debug/ocfs2
>> /sys/kernel/debug/ocfs2:
>> total 0
>>
>> ls -lR /sys/kernel/debug/o2dlm
>> /sys/kernel/debug/o2dlm:
>> total 0
>>
>> ocfs2_hb_ctl -I -d /dev/dm-2
>> ocfs2_hb_ctl: Device name specified was not found while reading uuid
>>
>> There is no /dev/dm-2 mounted.
>>
>>
>> On 10/19/2011 00:27, Sunil Mushran wrote:
>>> mount -t debugfs debugfs /sys/kernel/debug
>>>
>>> Then list that dir.
>>>
>>> Also, do:
>>> ocfs2_hb_ctl -l -d /dev/dm-2
>>>
>>> Be careful before killing. We want to be sure that dev is not mounted.
>>>
>>> On 10/18/2011 02:23 PM, Laurentiu Gosu wrote:
 Again   the outputs:
  cat 
 /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
 dm-2
 --->here should be volgr1-lvol0 i guess?

 ls -lR /sys/kernel/debug/ocfs2
 ls: /sys/kernel/debug/ocfs2: No such file or directory

 ls -lR /sys/kernel/debug/o2dlm
 ls: /sys/kernel/debug/o2dlm: No such file or directory

 I think i have to enable debug first somehow..?

 Laurentiu.

 On 10/19/2011 00:17, Sunil Mushran wrote:
> What does this return?
> cat 
> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
>
> Also, do:
> ls -lR /sys/kernel/debug/ocfs2
> ls -lR /sys/kernel/debug/o2dlm
>
> On 10/18/2011 02:14 PM, Laurentiu Gosu wrote:
>> Here is the output:
>>
>> ls -lR /sys/kernel/config/cluster
>> /sys/kernel/config/cluster:
>> total 0
>> drwxr-xr-x 4 root root 0 Oct 19 00:12 CLUSTER
>>
>> /sys/kernel/config/cluster/CLUSTER:
>> total 0
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 fence_method
>> drwxr-xr-x 3 root root0 Oct 19 00:12 heartbeat
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 idle_timeout_ms
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 keepalive_delay_ms
>> drwxr-xr-x 4 root root0 Oct 11 20:23 node
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 reconnect_delay_ms
>>
>> /sys/kernel/config/cluster/CLUSTER/heartbeat:
>> total 0
>> drwxr-xr-x 2 root root0 Oct 19 00:12 
>> 918673F06F8F4ED188DDCE14F39945F6
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 dead_threshold
>>
>> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6:
>>  
>>
>> total 0
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 block_bytes
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 blocks
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 dev
>> -r--r--r-- 1 root root 4096 Oct 19 00:12 pid
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 start_block
>>
>> /sys/kernel/config/cluster/CLUSTER/node:
>> total 0
>> drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv001
>> drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv002
>>
>> /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv001:
>> total 0
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 local
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 num
>>
>> /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv002:
>> total 0
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 local
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 num
>>
>>
>>
>>
>> On 10/19/2011 00:12, Sunil Mushran wrote:
>>> ls -lR /sys/kernel/config/cluster
>>>
>>> What does this return?
>>>
>>> On 10/18/2011 02:05 PM, Laurentiu Gosu wrote:
 Hi,
 I have a 2 nodes ocfs2 cluster running UEK 2.6.32-100.0.19.el5,
 ocfs2console-1.6.3-2.el5, ocfs2-tools-1.6.3-2.el5.
 My problem is that all the time when i try to run 
 /etc/init.d/o2cb stop
 it fails with this error:
   Stopping O2CB cluster CLUSTER: Failed
   Unable to stop cluster as heartbeat region still active
 There is no active mount point. I tried to manually stop the 
 heartdbeat
 with "ocfs2_hb_ctl -K -d /dev/mapper/volgr1-lvol0 ocfs2" (after 
 finding
 the refs number with "ocfs2_hb_ctl -I -d 
 /dev/mapper/vol

Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Sunil Mushran
So it is not mounted. But we still have a hb thread because
hb could not be stopped during umount. The reason for that
could be the same that causes ocfs2_hb_ctl to fail.

Do:
mounted.ocfs2 -d

On 10/18/2011 02:32 PM, Laurentiu Gosu wrote:
> ls -lR /sys/kernel/debug/ocfs2
> /sys/kernel/debug/ocfs2:
> total 0
>
> ls -lR /sys/kernel/debug/o2dlm
> /sys/kernel/debug/o2dlm:
> total 0
>
> ocfs2_hb_ctl -I -d /dev/dm-2
> ocfs2_hb_ctl: Device name specified was not found while reading uuid
>
> There is no /dev/dm-2 mounted.
>
>
> On 10/19/2011 00:27, Sunil Mushran wrote:
>> mount -t debugfs debugfs /sys/kernel/debug
>>
>> Then list that dir.
>>
>> Also, do:
>> ocfs2_hb_ctl -l -d /dev/dm-2
>>
>> Be careful before killing. We want to be sure that dev is not mounted.
>>
>> On 10/18/2011 02:23 PM, Laurentiu Gosu wrote:
>>> Again   the outputs:
>>>  cat 
>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
>>> dm-2
>>> --->here should be volgr1-lvol0 i guess?
>>>
>>> ls -lR /sys/kernel/debug/ocfs2
>>> ls: /sys/kernel/debug/ocfs2: No such file or directory
>>>
>>> ls -lR /sys/kernel/debug/o2dlm
>>> ls: /sys/kernel/debug/o2dlm: No such file or directory
>>>
>>> I think i have to enable debug first somehow..?
>>>
>>> Laurentiu.
>>>
>>> On 10/19/2011 00:17, Sunil Mushran wrote:
 What does this return?
 cat 
 /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev

 Also, do:
 ls -lR /sys/kernel/debug/ocfs2
 ls -lR /sys/kernel/debug/o2dlm

 On 10/18/2011 02:14 PM, Laurentiu Gosu wrote:
> Here is the output:
>
> ls -lR /sys/kernel/config/cluster
> /sys/kernel/config/cluster:
> total 0
> drwxr-xr-x 4 root root 0 Oct 19 00:12 CLUSTER
>
> /sys/kernel/config/cluster/CLUSTER:
> total 0
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 fence_method
> drwxr-xr-x 3 root root0 Oct 19 00:12 heartbeat
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 idle_timeout_ms
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 keepalive_delay_ms
> drwxr-xr-x 4 root root0 Oct 11 20:23 node
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 reconnect_delay_ms
>
> /sys/kernel/config/cluster/CLUSTER/heartbeat:
> total 0
> drwxr-xr-x 2 root root0 Oct 19 00:12 918673F06F8F4ED188DDCE14F39945F6
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 dead_threshold
>
> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6:
> total 0
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 block_bytes
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 blocks
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 dev
> -r--r--r-- 1 root root 4096 Oct 19 00:12 pid
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 start_block
>
> /sys/kernel/config/cluster/CLUSTER/node:
> total 0
> drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv001
> drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv002
>
> /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv001:
> total 0
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 local
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 num
>
> /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv002:
> total 0
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 local
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 num
>
>
>
>
> On 10/19/2011 00:12, Sunil Mushran wrote:
>> ls -lR /sys/kernel/config/cluster
>>
>> What does this return?
>>
>> On 10/18/2011 02:05 PM, Laurentiu Gosu wrote:
>>> Hi,
>>> I have a 2 nodes ocfs2 cluster running UEK 2.6.32-100.0.19.el5,
>>> ocfs2console-1.6.3-2.el5, ocfs2-tools-1.6.3-2.el5.
>>> My problem is that all the time when i try to run /etc/init.d/o2cb stop
>>> it fails with this error:
>>>   Stopping O2CB cluster CLUSTER: Failed
>>>   Unable to stop cluster as heartbeat region still active
>>> There is no active mount point. I tried to manually stop the heartdbeat
>>> with "ocfs2_hb_ctl -K -d /dev/mapper/volgr1-lvol0 ocfs2" (after finding
>>> the refs number with "ocfs2_hb_ctl -I -d /dev/mapper/volgr1-lvol0 ").
>>> But even if refs number is set to zero the "heartbeat region still
>>> active" occurs.
>>> How can i fix this?
>>>
>>> Thank you in advance.
>>> Laurentiu.
>>>
>>>
>>> ___
>>> Ocfs2-users mailing list
>>> Ocfs2-users@oss.oracle.com
>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>
>

>>>
>>
>


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Laurentiu Gosu
ls -lR /sys/kernel/debug/ocfs2
/sys/kernel/debug/ocfs2:
total 0

ls -lR /sys/kernel/debug/o2dlm
/sys/kernel/debug/o2dlm:
total 0

ocfs2_hb_ctl -I -d /dev/dm-2
ocfs2_hb_ctl: Device name specified was not found while reading uuid

There is no /dev/dm-2 mounted.


On 10/19/2011 00:27, Sunil Mushran wrote:
> mount -t debugfs debugfs /sys/kernel/debug
>
> Then list that dir.
>
> Also, do:
> ocfs2_hb_ctl -l -d /dev/dm-2
>
> Be careful before killing. We want to be sure that dev is not mounted.
>
> On 10/18/2011 02:23 PM, Laurentiu Gosu wrote:
>> Again   the outputs:
>>  cat 
>> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
>> dm-2
>> --->here should be volgr1-lvol0 i guess?
>>
>> ls -lR /sys/kernel/debug/ocfs2
>> ls: /sys/kernel/debug/ocfs2: No such file or directory
>>
>> ls -lR /sys/kernel/debug/o2dlm
>> ls: /sys/kernel/debug/o2dlm: No such file or directory
>>
>> I think i have to enable debug first somehow..?
>>
>> Laurentiu.
>>
>> On 10/19/2011 00:17, Sunil Mushran wrote:
>>> What does this return?
>>> cat 
>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
>>>
>>> Also, do:
>>> ls -lR /sys/kernel/debug/ocfs2
>>> ls -lR /sys/kernel/debug/o2dlm
>>>
>>> On 10/18/2011 02:14 PM, Laurentiu Gosu wrote:
 Here is the output:

 ls -lR /sys/kernel/config/cluster
 /sys/kernel/config/cluster:
 total 0
 drwxr-xr-x 4 root root 0 Oct 19 00:12 CLUSTER

 /sys/kernel/config/cluster/CLUSTER:
 total 0
 -rw-r--r-- 1 root root 4096 Oct 19 00:12 fence_method
 drwxr-xr-x 3 root root0 Oct 19 00:12 heartbeat
 -rw-r--r-- 1 root root 4096 Oct 19 00:12 idle_timeout_ms
 -rw-r--r-- 1 root root 4096 Oct 19 00:12 keepalive_delay_ms
 drwxr-xr-x 4 root root0 Oct 11 20:23 node
 -rw-r--r-- 1 root root 4096 Oct 19 00:12 reconnect_delay_ms

 /sys/kernel/config/cluster/CLUSTER/heartbeat:
 total 0
 drwxr-xr-x 2 root root0 Oct 19 00:12 
 918673F06F8F4ED188DDCE14F39945F6
 -rw-r--r-- 1 root root 4096 Oct 19 00:12 dead_threshold

 /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6:
  

 total 0
 -rw-r--r-- 1 root root 4096 Oct 19 00:12 block_bytes
 -rw-r--r-- 1 root root 4096 Oct 19 00:12 blocks
 -rw-r--r-- 1 root root 4096 Oct 19 00:12 dev
 -r--r--r-- 1 root root 4096 Oct 19 00:12 pid
 -rw-r--r-- 1 root root 4096 Oct 19 00:12 start_block

 /sys/kernel/config/cluster/CLUSTER/node:
 total 0
 drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv001
 drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv002

 /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv001:
 total 0
 -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
 -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
 -rw-r--r-- 1 root root 4096 Oct 19 00:12 local
 -rw-r--r-- 1 root root 4096 Oct 19 00:12 num

 /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv002:
 total 0
 -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
 -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
 -rw-r--r-- 1 root root 4096 Oct 19 00:12 local
 -rw-r--r-- 1 root root 4096 Oct 19 00:12 num




 On 10/19/2011 00:12, Sunil Mushran wrote:
> ls -lR /sys/kernel/config/cluster
>
> What does this return?
>
> On 10/18/2011 02:05 PM, Laurentiu Gosu wrote:
>> Hi,
>> I have a 2 nodes ocfs2 cluster running UEK 2.6.32-100.0.19.el5,
>> ocfs2console-1.6.3-2.el5, ocfs2-tools-1.6.3-2.el5.
>> My problem is that all the time when i try to run 
>> /etc/init.d/o2cb stop
>> it fails with this error:
>>   Stopping O2CB cluster CLUSTER: Failed
>>   Unable to stop cluster as heartbeat region still active
>> There is no active mount point. I tried to manually stop the 
>> heartdbeat
>> with "ocfs2_hb_ctl -K -d /dev/mapper/volgr1-lvol0 ocfs2" (after 
>> finding
>> the refs number with "ocfs2_hb_ctl -I -d /dev/mapper/volgr1-lvol0 
>> ").
>> But even if refs number is set to zero the "heartbeat region still
>> active" occurs.
>> How can i fix this?
>>
>> Thank you in advance.
>> Laurentiu.
>>
>>
>> ___
>> Ocfs2-users mailing list
>> Ocfs2-users@oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>

>>>
>>
>


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Sunil Mushran
mount -t debugfs debugfs /sys/kernel/debug

Then list that dir.

Also, do:
ocfs2_hb_ctl -l -d /dev/dm-2

Be careful before killing. We want to be sure that dev is not mounted.

On 10/18/2011 02:23 PM, Laurentiu Gosu wrote:
> Again   the outputs:
>  cat 
> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
> dm-2
> --->here should be volgr1-lvol0 i guess?
>
> ls -lR /sys/kernel/debug/ocfs2
> ls: /sys/kernel/debug/ocfs2: No such file or directory
>
> ls -lR /sys/kernel/debug/o2dlm
> ls: /sys/kernel/debug/o2dlm: No such file or directory
>
> I think i have to enable debug first somehow..?
>
> Laurentiu.
>
> On 10/19/2011 00:17, Sunil Mushran wrote:
>> What does this return?
>> cat 
>> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
>>
>> Also, do:
>> ls -lR /sys/kernel/debug/ocfs2
>> ls -lR /sys/kernel/debug/o2dlm
>>
>> On 10/18/2011 02:14 PM, Laurentiu Gosu wrote:
>>> Here is the output:
>>>
>>> ls -lR /sys/kernel/config/cluster
>>> /sys/kernel/config/cluster:
>>> total 0
>>> drwxr-xr-x 4 root root 0 Oct 19 00:12 CLUSTER
>>>
>>> /sys/kernel/config/cluster/CLUSTER:
>>> total 0
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 fence_method
>>> drwxr-xr-x 3 root root0 Oct 19 00:12 heartbeat
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 idle_timeout_ms
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 keepalive_delay_ms
>>> drwxr-xr-x 4 root root0 Oct 11 20:23 node
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 reconnect_delay_ms
>>>
>>> /sys/kernel/config/cluster/CLUSTER/heartbeat:
>>> total 0
>>> drwxr-xr-x 2 root root0 Oct 19 00:12 918673F06F8F4ED188DDCE14F39945F6
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 dead_threshold
>>>
>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6:
>>> total 0
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 block_bytes
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 blocks
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 dev
>>> -r--r--r-- 1 root root 4096 Oct 19 00:12 pid
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 start_block
>>>
>>> /sys/kernel/config/cluster/CLUSTER/node:
>>> total 0
>>> drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv001
>>> drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv002
>>>
>>> /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv001:
>>> total 0
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 local
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 num
>>>
>>> /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv002:
>>> total 0
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 local
>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 num
>>>
>>>
>>>
>>>
>>> On 10/19/2011 00:12, Sunil Mushran wrote:
 ls -lR /sys/kernel/config/cluster

 What does this return?

 On 10/18/2011 02:05 PM, Laurentiu Gosu wrote:
> Hi,
> I have a 2 nodes ocfs2 cluster running UEK 2.6.32-100.0.19.el5,
> ocfs2console-1.6.3-2.el5, ocfs2-tools-1.6.3-2.el5.
> My problem is that all the time when i try to run /etc/init.d/o2cb stop
> it fails with this error:
>   Stopping O2CB cluster CLUSTER: Failed
>   Unable to stop cluster as heartbeat region still active
> There is no active mount point. I tried to manually stop the heartdbeat
> with "ocfs2_hb_ctl -K -d /dev/mapper/volgr1-lvol0 ocfs2" (after finding
> the refs number with "ocfs2_hb_ctl -I -d /dev/mapper/volgr1-lvol0 ").
> But even if refs number is set to zero the "heartbeat region still
> active" occurs.
> How can i fix this?
>
> Thank you in advance.
> Laurentiu.
>
>
> ___
> Ocfs2-users mailing list
> Ocfs2-users@oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users

>>>
>>
>


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Laurentiu Gosu
Again   the outputs:
  cat 
/sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
dm-2
--->here should be volgr1-lvol0 i guess?

ls -lR /sys/kernel/debug/ocfs2
ls: /sys/kernel/debug/ocfs2: No such file or directory

ls -lR /sys/kernel/debug/o2dlm
ls: /sys/kernel/debug/o2dlm: No such file or directory

I think i have to enable debug first somehow..?

Laurentiu.

On 10/19/2011 00:17, Sunil Mushran wrote:
> What does this return?
> cat 
> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
>
> Also, do:
> ls -lR /sys/kernel/debug/ocfs2
> ls -lR /sys/kernel/debug/o2dlm
>
> On 10/18/2011 02:14 PM, Laurentiu Gosu wrote:
>> Here is the output:
>>
>> ls -lR /sys/kernel/config/cluster
>> /sys/kernel/config/cluster:
>> total 0
>> drwxr-xr-x 4 root root 0 Oct 19 00:12 CLUSTER
>>
>> /sys/kernel/config/cluster/CLUSTER:
>> total 0
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 fence_method
>> drwxr-xr-x 3 root root0 Oct 19 00:12 heartbeat
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 idle_timeout_ms
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 keepalive_delay_ms
>> drwxr-xr-x 4 root root0 Oct 11 20:23 node
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 reconnect_delay_ms
>>
>> /sys/kernel/config/cluster/CLUSTER/heartbeat:
>> total 0
>> drwxr-xr-x 2 root root0 Oct 19 00:12 
>> 918673F06F8F4ED188DDCE14F39945F6
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 dead_threshold
>>
>> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6:
>>  
>>
>> total 0
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 block_bytes
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 blocks
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 dev
>> -r--r--r-- 1 root root 4096 Oct 19 00:12 pid
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 start_block
>>
>> /sys/kernel/config/cluster/CLUSTER/node:
>> total 0
>> drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv001
>> drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv002
>>
>> /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv001:
>> total 0
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 local
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 num
>>
>> /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv002:
>> total 0
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 local
>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 num
>>
>>
>>
>>
>> On 10/19/2011 00:12, Sunil Mushran wrote:
>>> ls -lR /sys/kernel/config/cluster
>>>
>>> What does this return?
>>>
>>> On 10/18/2011 02:05 PM, Laurentiu Gosu wrote:
 Hi,
 I have a 2 nodes ocfs2 cluster running UEK 2.6.32-100.0.19.el5,
 ocfs2console-1.6.3-2.el5, ocfs2-tools-1.6.3-2.el5.
 My problem is that all the time when i try to run /etc/init.d/o2cb 
 stop
 it fails with this error:
   Stopping O2CB cluster CLUSTER: Failed
   Unable to stop cluster as heartbeat region still active
 There is no active mount point. I tried to manually stop the 
 heartdbeat
 with "ocfs2_hb_ctl -K -d /dev/mapper/volgr1-lvol0 ocfs2" (after 
 finding
 the refs number with "ocfs2_hb_ctl -I -d /dev/mapper/volgr1-lvol0 ").
 But even if refs number is set to zero the "heartbeat region still
 active" occurs.
 How can i fix this?

 Thank you in advance.
 Laurentiu.


 ___
 Ocfs2-users mailing list
 Ocfs2-users@oss.oracle.com
 http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>
>>
>


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Sunil Mushran
What does this return?
cat 
/sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev

Also, do:
ls -lR /sys/kernel/debug/ocfs2
ls -lR /sys/kernel/debug/o2dlm

On 10/18/2011 02:14 PM, Laurentiu Gosu wrote:
> Here is the output:
>
> ls -lR /sys/kernel/config/cluster
> /sys/kernel/config/cluster:
> total 0
> drwxr-xr-x 4 root root 0 Oct 19 00:12 CLUSTER
>
> /sys/kernel/config/cluster/CLUSTER:
> total 0
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 fence_method
> drwxr-xr-x 3 root root0 Oct 19 00:12 heartbeat
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 idle_timeout_ms
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 keepalive_delay_ms
> drwxr-xr-x 4 root root0 Oct 11 20:23 node
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 reconnect_delay_ms
>
> /sys/kernel/config/cluster/CLUSTER/heartbeat:
> total 0
> drwxr-xr-x 2 root root0 Oct 19 00:12 918673F06F8F4ED188DDCE14F39945F6
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 dead_threshold
>
> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6:
> total 0
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 block_bytes
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 blocks
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 dev
> -r--r--r-- 1 root root 4096 Oct 19 00:12 pid
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 start_block
>
> /sys/kernel/config/cluster/CLUSTER/node:
> total 0
> drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv001
> drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv002
>
> /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv001:
> total 0
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 local
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 num
>
> /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv002:
> total 0
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 local
> -rw-r--r-- 1 root root 4096 Oct 19 00:12 num
>
>
>
>
> On 10/19/2011 00:12, Sunil Mushran wrote:
>> ls -lR /sys/kernel/config/cluster
>>
>> What does this return?
>>
>> On 10/18/2011 02:05 PM, Laurentiu Gosu wrote:
>>> Hi,
>>> I have a 2 nodes ocfs2 cluster running UEK 2.6.32-100.0.19.el5,
>>> ocfs2console-1.6.3-2.el5, ocfs2-tools-1.6.3-2.el5.
>>> My problem is that all the time when i try to run /etc/init.d/o2cb stop
>>> it fails with this error:
>>>   Stopping O2CB cluster CLUSTER: Failed
>>>   Unable to stop cluster as heartbeat region still active
>>> There is no active mount point. I tried to manually stop the heartdbeat
>>> with "ocfs2_hb_ctl -K -d /dev/mapper/volgr1-lvol0 ocfs2" (after finding
>>> the refs number with "ocfs2_hb_ctl -I -d /dev/mapper/volgr1-lvol0 ").
>>> But even if refs number is set to zero the "heartbeat region still
>>> active" occurs.
>>> How can i fix this?
>>>
>>> Thank you in advance.
>>> Laurentiu.
>>>
>>>
>>> ___
>>> Ocfs2-users mailing list
>>> Ocfs2-users@oss.oracle.com
>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>
>


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Laurentiu Gosu
Here is the output:

ls -lR /sys/kernel/config/cluster
/sys/kernel/config/cluster:
total 0
drwxr-xr-x 4 root root 0 Oct 19 00:12 CLUSTER

/sys/kernel/config/cluster/CLUSTER:
total 0
-rw-r--r-- 1 root root 4096 Oct 19 00:12 fence_method
drwxr-xr-x 3 root root0 Oct 19 00:12 heartbeat
-rw-r--r-- 1 root root 4096 Oct 19 00:12 idle_timeout_ms
-rw-r--r-- 1 root root 4096 Oct 19 00:12 keepalive_delay_ms
drwxr-xr-x 4 root root0 Oct 11 20:23 node
-rw-r--r-- 1 root root 4096 Oct 19 00:12 reconnect_delay_ms

/sys/kernel/config/cluster/CLUSTER/heartbeat:
total 0
drwxr-xr-x 2 root root0 Oct 19 00:12 918673F06F8F4ED188DDCE14F39945F6
-rw-r--r-- 1 root root 4096 Oct 19 00:12 dead_threshold

/sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6:
total 0
-rw-r--r-- 1 root root 4096 Oct 19 00:12 block_bytes
-rw-r--r-- 1 root root 4096 Oct 19 00:12 blocks
-rw-r--r-- 1 root root 4096 Oct 19 00:12 dev
-r--r--r-- 1 root root 4096 Oct 19 00:12 pid
-rw-r--r-- 1 root root 4096 Oct 19 00:12 start_block

/sys/kernel/config/cluster/CLUSTER/node:
total 0
drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv001
drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv002

/sys/kernel/config/cluster/CLUSTER/node/ro02xsrv001:
total 0
-rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
-rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
-rw-r--r-- 1 root root 4096 Oct 19 00:12 local
-rw-r--r-- 1 root root 4096 Oct 19 00:12 num

/sys/kernel/config/cluster/CLUSTER/node/ro02xsrv002:
total 0
-rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
-rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
-rw-r--r-- 1 root root 4096 Oct 19 00:12 local
-rw-r--r-- 1 root root 4096 Oct 19 00:12 num




On 10/19/2011 00:12, Sunil Mushran wrote:
> ls -lR /sys/kernel/config/cluster
>
> What does this return?
>
> On 10/18/2011 02:05 PM, Laurentiu Gosu wrote:
>> Hi,
>> I have a 2 nodes ocfs2 cluster running UEK 2.6.32-100.0.19.el5,
>> ocfs2console-1.6.3-2.el5, ocfs2-tools-1.6.3-2.el5.
>> My problem is that all the time when i try to run /etc/init.d/o2cb stop
>> it fails with this error:
>>   Stopping O2CB cluster CLUSTER: Failed
>>   Unable to stop cluster as heartbeat region still active
>> There is no active mount point. I tried to manually stop the heartdbeat
>> with "ocfs2_hb_ctl -K -d /dev/mapper/volgr1-lvol0 ocfs2" (after finding
>> the refs number with "ocfs2_hb_ctl -I -d /dev/mapper/volgr1-lvol0 ").
>> But even if refs number is set to zero the "heartbeat region still
>> active" occurs.
>> How can i fix this?
>>
>> Thank you in advance.
>> Laurentiu.
>>
>>
>> ___
>> Ocfs2-users mailing list
>> Ocfs2-users@oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] Unable to stop cluster as heartbeat region still active

2011-10-18 Thread Sunil Mushran
ls -lR /sys/kernel/config/cluster

What does this return?

On 10/18/2011 02:05 PM, Laurentiu Gosu wrote:
> Hi,
> I have a 2 nodes ocfs2 cluster running UEK 2.6.32-100.0.19.el5,
> ocfs2console-1.6.3-2.el5, ocfs2-tools-1.6.3-2.el5.
> My problem is that all the time when i try to run /etc/init.d/o2cb stop
> it fails with this error:
>   Stopping O2CB cluster CLUSTER: Failed
>   Unable to stop cluster as heartbeat region still active
> There is no active mount point. I tried to manually stop the heartdbeat
> with "ocfs2_hb_ctl -K -d /dev/mapper/volgr1-lvol0 ocfs2" (after finding
> the refs number with "ocfs2_hb_ctl -I -d /dev/mapper/volgr1-lvol0 ").
> But even if refs number is set to zero the "heartbeat region still
> active" occurs.
> How can i fix this?
>
> Thank you in advance.
> Laurentiu.
>
>
> ___
> Ocfs2-users mailing list
> Ocfs2-users@oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users