[Ocfs2-users] Removing a node from cluster.conf (on a specific node)
Hi dear list, I think the subjet might already been discussed, but I can only found old threads about removing a node from the cluster. I was hoping that in 2012 it would be possible to dynamically add/remove nodes from a shared filesystem but this evening I had this problem: I wanted to add a node to our ocfs2 cluster, node named xen-blade11 with ip 10.111.10.111 So on every other node I ran this command: o2cb_ctl -C -i -n xen-blade11 -t node -a number=5 -a ip_address=10.111.10.111 -a ip_port= -a cluster=ocfs2 Which successfully added the node to every cluster node, except on xen-server16 On every node the original cluster.conf was: node: ip_port = ip_address = 10.111.10.116 number = 0 name = xen-blade16 cluster = ocfs2 node: ip_port = ip_address = 10.111.10.115 number = 1 name = xen-blade15 cluster = ocfs2 node: ip_port = ip_address = 10.111.10.114 number = 2 name = xen-blade14 cluster = ocfs2 node: ip_port = ip_address = 10.111.10.113 number = 3 name = xen-blade13 cluster = ocfs2 node: ip_port = ip_address = 10.111.10.112 number = 4 name = xen-blade12 cluster = ocfs2 cluster: node_count = 5 name = ocfs2 After adding the node, on every cluster.conf I can see that this was added: node: ip_port = ip_address = 10.111.10.111 number = 5 name = xen-blade11 cluster = ocfs2 cluster: node_count = 6 name = ocfs2 EXCEPT on xen-blade16 It added like this: node: ip_port = ip_address = 10.111.10.111 number = 6 name = xen-blade11 cluster = ocfs2 cluster: node_count = 6 name = ocfs2 (Notice the number = 6 instead of number = 5) So now when i'm trying to connect the xen-blade11 every host accept the connection except the xen-blade16, and the cluster joining is being rejected. as we can see in the kernel messages on xen-blade11 [ 1852.729539] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1852.729892] o2net: Connected to node xen-blade12 (num 4) at 10.111.10.112: [ 1852.737122] o2net: Connected to node xen-blade14 (num 2) at 10.111.10.114: [ 1852.741408] o2net: Connected to node xen-blade15 (num 1) at 10.111.10.115: [ 1854.733759] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1856.737129] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1856.764520] OCFS2 1.5.0 [ 1858.740877] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1860.744847] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1862.748919] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1864.752929] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1866.756825] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1868.760809] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1870.764937] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1872.768905] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1874.772947] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1876.776928] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1878.780828] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1880.784974] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1882.784529] o2net: No connection established with node 0 after 30.0 seconds, giving up. [ 1912.864531] o2net: No connection established with node 0 after 30.0 seconds, giving up. [ 1917.028531] o2cb: This node could not connect to nodes: 0. [ 1917.028684] o2cb: Cluster check failed. Fix errors before retrying. [ 1917.028758] (mount.ocfs2,4238,4):ocfs2_dlm_init:3001 ERROR: status = -107 [ 1917.028880] (mount.ocfs2,4238,4):ocfs2_mount_volume:1879 ERROR: status = -107 [ 1917.029005] ocfs2: Unmounting device (254,5) on (node 0) [ 1917.029022] (mount.ocfs2,4238,4):ocfs2_fill_super:1234 ERROR: status = -107 [ 1918.860551] o2net: No longer connected to node xen-blade15 (num 1) at 10.111.10.115: [ 1918.860599] o2net: No longer connected to node xen-blade14 (num 2) at 10.111.10.114: [ 1918.860636] o2net: No longer connected to node xen-blade12 (num 4) at 10.111.10.112: Okay so far, I thought I would try to remove that node from xen-blade16 and re-add it again, but...
Re: [Ocfs2-users] Removing a node from cluster.conf (on a specific node)
Online add/remove of nodes and of global heartbeat devices has been in mainline for over a year. I think 2.6.38+ and tools 1.8. The ocfs2-tools tree hosted on oss.oracle.com/git has a 1.8.2 tag that can be used safely. It has been fully tested. The user's guide has been moved to man pages bundled with the tools. Do man ocfs2 after building and installing the tools. On Apr 29, 2012, at 1:21 PM, Sébastien Riccio s...@swisscenter.com wrote: Hi dear list, I think the subjet might already been discussed, but I can only found old threads about removing a node from the cluster. I was hoping that in 2012 it would be possible to dynamically add/remove nodes from a shared filesystem but this evening I had this problem: I wanted to add a node to our ocfs2 cluster, node named xen-blade11 with ip 10.111.10.111 So on every other node I ran this command: o2cb_ctl -C -i -n xen-blade11 -t node -a number=5 -a ip_address=10.111.10.111 -a ip_port= -a cluster=ocfs2 Which successfully added the node to every cluster node, except on xen-server16 On every node the original cluster.conf was: node: ip_port = ip_address = 10.111.10.116 number = 0 name = xen-blade16 cluster = ocfs2 node: ip_port = ip_address = 10.111.10.115 number = 1 name = xen-blade15 cluster = ocfs2 node: ip_port = ip_address = 10.111.10.114 number = 2 name = xen-blade14 cluster = ocfs2 node: ip_port = ip_address = 10.111.10.113 number = 3 name = xen-blade13 cluster = ocfs2 node: ip_port = ip_address = 10.111.10.112 number = 4 name = xen-blade12 cluster = ocfs2 cluster: node_count = 5 name = ocfs2 After adding the node, on every cluster.conf I can see that this was added: node: ip_port = ip_address = 10.111.10.111 number = 5 name = xen-blade11 cluster = ocfs2 cluster: node_count = 6 name = ocfs2 EXCEPT on xen-blade16 It added like this: node: ip_port = ip_address = 10.111.10.111 number = 6 name = xen-blade11 cluster = ocfs2 cluster: node_count = 6 name = ocfs2 (Notice the number = 6 instead of number = 5) So now when i'm trying to connect the xen-blade11 every host accept the connection except the xen-blade16, and the cluster joining is being rejected. as we can see in the kernel messages on xen-blade11 [ 1852.729539] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1852.729892] o2net: Connected to node xen-blade12 (num 4) at 10.111.10.112: [ 1852.737122] o2net: Connected to node xen-blade14 (num 2) at 10.111.10.114: [ 1852.741408] o2net: Connected to node xen-blade15 (num 1) at 10.111.10.115: [ 1854.733759] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1856.737129] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1856.764520] OCFS2 1.5.0 [ 1858.740877] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1860.744847] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1862.748919] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1864.752929] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1866.756825] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1868.760809] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1870.764937] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1872.768905] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1874.772947] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1876.776928] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1878.780828] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1880.784974] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1882.784529] o2net: No connection established with node 0 after 30.0 seconds, giving up. [ 1912.864531] o2net: No connection established with node 0 after 30.0 seconds, giving up. [ 1917.028531] o2cb: This node could not connect to nodes: 0. [ 1917.028684] o2cb: Cluster check failed. Fix errors before retrying. [ 1917.028758] (mount.ocfs2,4238,4):ocfs2_dlm_init:3001 ERROR: status = -107 [ 1917.028880] (mount.ocfs2,4238,4):ocfs2_mount_volume:1879 ERROR:
Re: [Ocfs2-users] Removing a node from cluster.conf (on a specific node)
Hi Sunil, Thanks for your reply and help. Ok I'm doing this on XenSever/XCP and the kernel is a 2.6.32 Linux xen-blade16 2.6.32.12-0.7.1.xs1.1.0.327.170596xen #1 SMP Wed Nov 9 12:50:53 CET 2011 i686 i686 i386 GNU/Linux and the tools for centos 5.x (xcp is based on this) [root@xen-blade16 ~]# o2cb_ctl -V o2cb_ctl version 1.4.4 That would explain :) Pardon my ignorance, I think I'm stuck and that I will need to rebuild a new kernel unless updating the tools is enough to be able to remove a node... Thanks again, Sébastien On 30.04.2012 03:54, Sunil Mushran wrote: Online add/remove of nodes and of global heartbeat devices has been in mainline for over a year. I think 2.6.38+ and tools 1.8. The ocfs2-tools tree hosted on oss.oracle.com/git has a 1.8.2 tag that can be used safely. It has been fully tested. The user's guide has been moved to man pages bundled with the tools. Do man ocfs2 after building and installing the tools. On Apr 29, 2012, at 1:21 PM, Sébastien Riccios...@swisscenter.com wrote: Hi dear list, I think the subjet might already been discussed, but I can only found old threads about removing a node from the cluster. I was hoping that in 2012 it would be possible to dynamically add/remove nodes from a shared filesystem but this evening I had this problem: I wanted to add a node to our ocfs2 cluster, node named xen-blade11 with ip 10.111.10.111 So on every other node I ran this command: o2cb_ctl -C -i -n xen-blade11 -t node -a number=5 -a ip_address=10.111.10.111 -a ip_port= -a cluster=ocfs2 Which successfully added the node to every cluster node, except on xen-server16 On every node the original cluster.conf was: node: ip_port = ip_address = 10.111.10.116 number = 0 name = xen-blade16 cluster = ocfs2 node: ip_port = ip_address = 10.111.10.115 number = 1 name = xen-blade15 cluster = ocfs2 node: ip_port = ip_address = 10.111.10.114 number = 2 name = xen-blade14 cluster = ocfs2 node: ip_port = ip_address = 10.111.10.113 number = 3 name = xen-blade13 cluster = ocfs2 node: ip_port = ip_address = 10.111.10.112 number = 4 name = xen-blade12 cluster = ocfs2 cluster: node_count = 5 name = ocfs2 After adding the node, on every cluster.conf I can see that this was added: node: ip_port = ip_address = 10.111.10.111 number = 5 name = xen-blade11 cluster = ocfs2 cluster: node_count = 6 name = ocfs2 EXCEPT on xen-blade16 It added like this: node: ip_port = ip_address = 10.111.10.111 number = 6 name = xen-blade11 cluster = ocfs2 cluster: node_count = 6 name = ocfs2 (Notice the number = 6 instead of number = 5) So now when i'm trying to connect the xen-blade11 every host accept the connection except the xen-blade16, and the cluster joining is being rejected. as we can see in the kernel messages on xen-blade11 [ 1852.729539] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1852.729892] o2net: Connected to node xen-blade12 (num 4) at 10.111.10.112: [ 1852.737122] o2net: Connected to node xen-blade14 (num 2) at 10.111.10.114: [ 1852.741408] o2net: Connected to node xen-blade15 (num 1) at 10.111.10.115: [ 1854.733759] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1856.737129] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1856.764520] OCFS2 1.5.0 [ 1858.740877] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1860.744847] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1862.748919] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1864.752929] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1866.756825] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1868.760809] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1870.764937] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1872.768905] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1874.772947] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1876.776928] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown, state 7 [ 1878.780828] o2net: Connection to node xen-blade16 (num 0) at 10.111.10.116: shutdown,