[Ocfs2-users] Removing a node from cluster.conf (on a specific node)

2012-04-29 Thread Sébastien Riccio
Hi dear list,

I think the subjet might already been discussed, but I can only found 
old threads about removing a node from the cluster.

I was hoping that in 2012 it would be possible to dynamically add/remove 
nodes from a shared filesystem but this evening I had this problem:

I wanted to add a node to our ocfs2 cluster, node named xen-blade11 with 
ip 10.111.10.111

So on every other node I ran this command:

o2cb_ctl -C -i -n xen-blade11 -t node -a number=5 -a 
ip_address=10.111.10.111 -a ip_port= -a cluster=ocfs2

Which successfully added the node to every cluster node, except on 
xen-server16

On every node the original cluster.conf was:

node:
 ip_port = 
 ip_address = 10.111.10.116
 number = 0
 name = xen-blade16
 cluster = ocfs2

node:
 ip_port = 
 ip_address = 10.111.10.115
 number = 1
 name = xen-blade15
 cluster = ocfs2

node:
 ip_port = 
 ip_address = 10.111.10.114
 number = 2
 name = xen-blade14
 cluster = ocfs2

node:
 ip_port = 
 ip_address = 10.111.10.113
 number = 3
 name = xen-blade13
 cluster = ocfs2

node:
 ip_port = 
 ip_address = 10.111.10.112
 number = 4
 name = xen-blade12
 cluster = ocfs2

cluster:
 node_count = 5
 name = ocfs2


After adding the node, on every cluster.conf I can see that this was added:

node:
 ip_port = 
 ip_address = 10.111.10.111
 number = 5
 name = xen-blade11
 cluster = ocfs2

cluster:
 node_count = 6
 name = ocfs2

EXCEPT on xen-blade16

It added like this:

node:
 ip_port = 
 ip_address = 10.111.10.111
 number = 6
 name = xen-blade11
 cluster = ocfs2

cluster:
 node_count = 6
 name = ocfs2

(Notice the number = 6 instead of number = 5)

So now when i'm trying to connect the xen-blade11 every host accept the 
connection except the xen-blade16, and the cluster joining is being 
rejected.

as we can see in the kernel messages on xen-blade11

[ 1852.729539] o2net: Connection to node xen-blade16 (num 0) at 
10.111.10.116: shutdown, state 7
[ 1852.729892] o2net: Connected to node xen-blade12 (num 4) at 
10.111.10.112:
[ 1852.737122] o2net: Connected to node xen-blade14 (num 2) at 
10.111.10.114:
[ 1852.741408] o2net: Connected to node xen-blade15 (num 1) at 
10.111.10.115:
[ 1854.733759] o2net: Connection to node xen-blade16 (num 0) at 
10.111.10.116: shutdown, state 7
[ 1856.737129] o2net: Connection to node xen-blade16 (num 0) at 
10.111.10.116: shutdown, state 7
[ 1856.764520] OCFS2 1.5.0
[ 1858.740877] o2net: Connection to node xen-blade16 (num 0) at 
10.111.10.116: shutdown, state 7
[ 1860.744847] o2net: Connection to node xen-blade16 (num 0) at 
10.111.10.116: shutdown, state 7
[ 1862.748919] o2net: Connection to node xen-blade16 (num 0) at 
10.111.10.116: shutdown, state 7
[ 1864.752929] o2net: Connection to node xen-blade16 (num 0) at 
10.111.10.116: shutdown, state 7
[ 1866.756825] o2net: Connection to node xen-blade16 (num 0) at 
10.111.10.116: shutdown, state 7
[ 1868.760809] o2net: Connection to node xen-blade16 (num 0) at 
10.111.10.116: shutdown, state 7
[ 1870.764937] o2net: Connection to node xen-blade16 (num 0) at 
10.111.10.116: shutdown, state 7
[ 1872.768905] o2net: Connection to node xen-blade16 (num 0) at 
10.111.10.116: shutdown, state 7
[ 1874.772947] o2net: Connection to node xen-blade16 (num 0) at 
10.111.10.116: shutdown, state 7
[ 1876.776928] o2net: Connection to node xen-blade16 (num 0) at 
10.111.10.116: shutdown, state 7
[ 1878.780828] o2net: Connection to node xen-blade16 (num 0) at 
10.111.10.116: shutdown, state 7
[ 1880.784974] o2net: Connection to node xen-blade16 (num 0) at 
10.111.10.116: shutdown, state 7
[ 1882.784529] o2net: No connection established with node 0 after 30.0 
seconds, giving up.
[ 1912.864531] o2net: No connection established with node 0 after 30.0 
seconds, giving up.
[ 1917.028531] o2cb: This node could not connect to nodes: 0.
[ 1917.028684] o2cb: Cluster check failed. Fix errors before retrying.
[ 1917.028758] (mount.ocfs2,4238,4):ocfs2_dlm_init:3001 ERROR: status = -107
[ 1917.028880] (mount.ocfs2,4238,4):ocfs2_mount_volume:1879 ERROR: 
status = -107
[ 1917.029005] ocfs2: Unmounting device (254,5) on (node 0)
[ 1917.029022] (mount.ocfs2,4238,4):ocfs2_fill_super:1234 ERROR: status 
= -107
[ 1918.860551] o2net: No longer connected to node xen-blade15 (num 1) at 
10.111.10.115:
[ 1918.860599] o2net: No longer connected to node xen-blade14 (num 2) at 
10.111.10.114:
[ 1918.860636] o2net: No longer connected to node xen-blade12 (num 4) at 
10.111.10.112:

Okay so far, I thought I would try to remove that node from xen-blade16 
and re-add it again, but...


Re: [Ocfs2-users] Removing a node from cluster.conf (on a specific node)

2012-04-29 Thread Sunil Mushran
Online add/remove of nodes and of global heartbeat devices has been in mainline 
for over a year. I think 2.6.38+ and tools 1.8. The ocfs2-tools tree hosted on 
oss.oracle.com/git has a 1.8.2 tag that can be used safely. It has been fully 
tested. The user's guide has been moved to man pages bundled with the tools. Do 
man ocfs2 after building and installing the tools.

On Apr 29, 2012, at 1:21 PM, Sébastien Riccio s...@swisscenter.com wrote:

 Hi dear list,
 
 I think the subjet might already been discussed, but I can only found 
 old threads about removing a node from the cluster.
 
 I was hoping that in 2012 it would be possible to dynamically add/remove 
 nodes from a shared filesystem but this evening I had this problem:
 
 I wanted to add a node to our ocfs2 cluster, node named xen-blade11 with 
 ip 10.111.10.111
 
 So on every other node I ran this command:
 
 o2cb_ctl -C -i -n xen-blade11 -t node -a number=5 -a 
 ip_address=10.111.10.111 -a ip_port= -a cluster=ocfs2
 
 Which successfully added the node to every cluster node, except on 
 xen-server16
 
 On every node the original cluster.conf was:
 
 node:
 ip_port = 
 ip_address = 10.111.10.116
 number = 0
 name = xen-blade16
 cluster = ocfs2
 
 node:
 ip_port = 
 ip_address = 10.111.10.115
 number = 1
 name = xen-blade15
 cluster = ocfs2
 
 node:
 ip_port = 
 ip_address = 10.111.10.114
 number = 2
 name = xen-blade14
 cluster = ocfs2
 
 node:
 ip_port = 
 ip_address = 10.111.10.113
 number = 3
 name = xen-blade13
 cluster = ocfs2
 
 node:
 ip_port = 
 ip_address = 10.111.10.112
 number = 4
 name = xen-blade12
 cluster = ocfs2
 
 cluster:
 node_count = 5
 name = ocfs2
 
 
 After adding the node, on every cluster.conf I can see that this was added:
 
 node:
 ip_port = 
 ip_address = 10.111.10.111
 number = 5
 name = xen-blade11
 cluster = ocfs2
 
 cluster:
 node_count = 6
 name = ocfs2
 
 EXCEPT on xen-blade16
 
 It added like this:
 
 node:
 ip_port = 
 ip_address = 10.111.10.111
 number = 6
 name = xen-blade11
 cluster = ocfs2
 
 cluster:
 node_count = 6
 name = ocfs2
 
 (Notice the number = 6 instead of number = 5)
 
 So now when i'm trying to connect the xen-blade11 every host accept the 
 connection except the xen-blade16, and the cluster joining is being 
 rejected.
 
 as we can see in the kernel messages on xen-blade11
 
 [ 1852.729539] o2net: Connection to node xen-blade16 (num 0) at 
 10.111.10.116: shutdown, state 7
 [ 1852.729892] o2net: Connected to node xen-blade12 (num 4) at 
 10.111.10.112:
 [ 1852.737122] o2net: Connected to node xen-blade14 (num 2) at 
 10.111.10.114:
 [ 1852.741408] o2net: Connected to node xen-blade15 (num 1) at 
 10.111.10.115:
 [ 1854.733759] o2net: Connection to node xen-blade16 (num 0) at 
 10.111.10.116: shutdown, state 7
 [ 1856.737129] o2net: Connection to node xen-blade16 (num 0) at 
 10.111.10.116: shutdown, state 7
 [ 1856.764520] OCFS2 1.5.0
 [ 1858.740877] o2net: Connection to node xen-blade16 (num 0) at 
 10.111.10.116: shutdown, state 7
 [ 1860.744847] o2net: Connection to node xen-blade16 (num 0) at 
 10.111.10.116: shutdown, state 7
 [ 1862.748919] o2net: Connection to node xen-blade16 (num 0) at 
 10.111.10.116: shutdown, state 7
 [ 1864.752929] o2net: Connection to node xen-blade16 (num 0) at 
 10.111.10.116: shutdown, state 7
 [ 1866.756825] o2net: Connection to node xen-blade16 (num 0) at 
 10.111.10.116: shutdown, state 7
 [ 1868.760809] o2net: Connection to node xen-blade16 (num 0) at 
 10.111.10.116: shutdown, state 7
 [ 1870.764937] o2net: Connection to node xen-blade16 (num 0) at 
 10.111.10.116: shutdown, state 7
 [ 1872.768905] o2net: Connection to node xen-blade16 (num 0) at 
 10.111.10.116: shutdown, state 7
 [ 1874.772947] o2net: Connection to node xen-blade16 (num 0) at 
 10.111.10.116: shutdown, state 7
 [ 1876.776928] o2net: Connection to node xen-blade16 (num 0) at 
 10.111.10.116: shutdown, state 7
 [ 1878.780828] o2net: Connection to node xen-blade16 (num 0) at 
 10.111.10.116: shutdown, state 7
 [ 1880.784974] o2net: Connection to node xen-blade16 (num 0) at 
 10.111.10.116: shutdown, state 7
 [ 1882.784529] o2net: No connection established with node 0 after 30.0 
 seconds, giving up.
 [ 1912.864531] o2net: No connection established with node 0 after 30.0 
 seconds, giving up.
 [ 1917.028531] o2cb: This node could not connect to nodes: 0.
 [ 1917.028684] o2cb: Cluster check failed. Fix errors before retrying.
 [ 1917.028758] (mount.ocfs2,4238,4):ocfs2_dlm_init:3001 ERROR: status = -107
 [ 1917.028880] (mount.ocfs2,4238,4):ocfs2_mount_volume:1879 ERROR: 

Re: [Ocfs2-users] Removing a node from cluster.conf (on a specific node)

2012-04-29 Thread Sébastien Riccio
Hi Sunil,

Thanks for your reply and help. Ok I'm doing this on XenSever/XCP and 
the kernel is a 2.6.32

Linux xen-blade16 2.6.32.12-0.7.1.xs1.1.0.327.170596xen #1 SMP Wed Nov 9 
12:50:53 CET 2011 i686 i686 i386 GNU/Linux

and the tools for centos 5.x (xcp is based on this)

[root@xen-blade16 ~]# o2cb_ctl -V
o2cb_ctl version 1.4.4

That would explain :)

Pardon my ignorance, I think I'm stuck and that I will need to rebuild a 
new kernel unless updating the tools is enough to be able to remove a 
node...

Thanks again,
Sébastien






On 30.04.2012 03:54, Sunil Mushran wrote:
 Online add/remove of nodes and of global heartbeat devices has been in 
 mainline for over a year. I think 2.6.38+ and tools 1.8. The ocfs2-tools tree 
 hosted on oss.oracle.com/git has a 1.8.2 tag that can be used safely. It has 
 been fully tested. The user's guide has been moved to man pages bundled with 
 the tools. Do man ocfs2 after building and installing the tools.

 On Apr 29, 2012, at 1:21 PM, Sébastien Riccios...@swisscenter.com  wrote:

 Hi dear list,

 I think the subjet might already been discussed, but I can only found
 old threads about removing a node from the cluster.

 I was hoping that in 2012 it would be possible to dynamically add/remove
 nodes from a shared filesystem but this evening I had this problem:

 I wanted to add a node to our ocfs2 cluster, node named xen-blade11 with
 ip 10.111.10.111

 So on every other node I ran this command:

 o2cb_ctl -C -i -n xen-blade11 -t node -a number=5 -a
 ip_address=10.111.10.111 -a ip_port= -a cluster=ocfs2

 Which successfully added the node to every cluster node, except on
 xen-server16

 On every node the original cluster.conf was:

 node:
  ip_port = 
  ip_address = 10.111.10.116
  number = 0
  name = xen-blade16
  cluster = ocfs2

 node:
  ip_port = 
  ip_address = 10.111.10.115
  number = 1
  name = xen-blade15
  cluster = ocfs2

 node:
  ip_port = 
  ip_address = 10.111.10.114
  number = 2
  name = xen-blade14
  cluster = ocfs2

 node:
  ip_port = 
  ip_address = 10.111.10.113
  number = 3
  name = xen-blade13
  cluster = ocfs2

 node:
  ip_port = 
  ip_address = 10.111.10.112
  number = 4
  name = xen-blade12
  cluster = ocfs2

 cluster:
  node_count = 5
  name = ocfs2


 After adding the node, on every cluster.conf I can see that this was added:

 node:
  ip_port = 
  ip_address = 10.111.10.111
  number = 5
  name = xen-blade11
  cluster = ocfs2

 cluster:
  node_count = 6
  name = ocfs2

 EXCEPT on xen-blade16

 It added like this:

 node:
  ip_port = 
  ip_address = 10.111.10.111
  number = 6
  name = xen-blade11
  cluster = ocfs2

 cluster:
  node_count = 6
  name = ocfs2

 (Notice the number = 6 instead of number = 5)

 So now when i'm trying to connect the xen-blade11 every host accept the
 connection except the xen-blade16, and the cluster joining is being
 rejected.

 as we can see in the kernel messages on xen-blade11

 [ 1852.729539] o2net: Connection to node xen-blade16 (num 0) at
 10.111.10.116: shutdown, state 7
 [ 1852.729892] o2net: Connected to node xen-blade12 (num 4) at
 10.111.10.112:
 [ 1852.737122] o2net: Connected to node xen-blade14 (num 2) at
 10.111.10.114:
 [ 1852.741408] o2net: Connected to node xen-blade15 (num 1) at
 10.111.10.115:
 [ 1854.733759] o2net: Connection to node xen-blade16 (num 0) at
 10.111.10.116: shutdown, state 7
 [ 1856.737129] o2net: Connection to node xen-blade16 (num 0) at
 10.111.10.116: shutdown, state 7
 [ 1856.764520] OCFS2 1.5.0
 [ 1858.740877] o2net: Connection to node xen-blade16 (num 0) at
 10.111.10.116: shutdown, state 7
 [ 1860.744847] o2net: Connection to node xen-blade16 (num 0) at
 10.111.10.116: shutdown, state 7
 [ 1862.748919] o2net: Connection to node xen-blade16 (num 0) at
 10.111.10.116: shutdown, state 7
 [ 1864.752929] o2net: Connection to node xen-blade16 (num 0) at
 10.111.10.116: shutdown, state 7
 [ 1866.756825] o2net: Connection to node xen-blade16 (num 0) at
 10.111.10.116: shutdown, state 7
 [ 1868.760809] o2net: Connection to node xen-blade16 (num 0) at
 10.111.10.116: shutdown, state 7
 [ 1870.764937] o2net: Connection to node xen-blade16 (num 0) at
 10.111.10.116: shutdown, state 7
 [ 1872.768905] o2net: Connection to node xen-blade16 (num 0) at
 10.111.10.116: shutdown, state 7
 [ 1874.772947] o2net: Connection to node xen-blade16 (num 0) at
 10.111.10.116: shutdown, state 7
 [ 1876.776928] o2net: Connection to node xen-blade16 (num 0) at
 10.111.10.116: shutdown, state 7
 [ 1878.780828] o2net: Connection to node xen-blade16 (num 0) at
 10.111.10.116: shutdown,