Hi Sunil, Thanks for your reply and help. Ok I'm doing this on XenSever/XCP and the kernel is a 2.6.32
Linux xen-blade16 2.6.32.12-0.7.1.xs1.1.0.327.170596xen #1 SMP Wed Nov 9 12:50:53 CET 2011 i686 i686 i386 GNU/Linux and the tools for centos 5.x (xcp is based on this) [root@xen-blade16 ~]# o2cb_ctl -V o2cb_ctl version 1.4.4 That would explain :) Pardon my ignorance, I think I'm stuck and that I will need to rebuild a new kernel unless updating the tools is enough to be able to remove a node... Thanks again, Sébastien On 30.04.2012 03:54, Sunil Mushran wrote: > Online add/remove of nodes and of global heartbeat devices has been in > mainline for over a year. I think 2.6.38+ and tools 1.8. The ocfs2-tools tree > hosted on oss.oracle.com/git has a 1.8.2 tag that can be used safely. It has > been fully tested. The user's guide has been moved to man pages bundled with > the tools. Do man ocfs2 after building and installing the tools. > > On Apr 29, 2012, at 1:21 PM, Sébastien Riccio<s...@swisscenter.com> wrote: > >> Hi dear list, >> >> I think the subjet might already been discussed, but I can only found >> old threads about removing a node from the cluster. >> >> I was hoping that in 2012 it would be possible to dynamically add/remove >> nodes from a shared filesystem but this evening I had this problem: >> >> I wanted to add a node to our ocfs2 cluster, node named xen-blade11 with >> ip 10.111.10.111 >> >> So on every other node I ran this command: >> >> o2cb_ctl -C -i -n xen-blade11 -t node -a number=5 -a >> ip_address=10.111.10.111 -a ip_port=7777 -a cluster=ocfs2 >> >> Which successfully added the node to every cluster node, except on >> xen-server16 >> >> On every node the original cluster.conf was: >> >> node: >> ip_port = 7777 >> ip_address = 10.111.10.116 >> number = 0 >> name = xen-blade16 >> cluster = ocfs2 >> >> node: >> ip_port = 7777 >> ip_address = 10.111.10.115 >> number = 1 >> name = xen-blade15 >> cluster = ocfs2 >> >> node: >> ip_port = 7777 >> ip_address = 10.111.10.114 >> number = 2 >> name = xen-blade14 >> cluster = ocfs2 >> >> node: >> ip_port = 7777 >> ip_address = 10.111.10.113 >> number = 3 >> name = xen-blade13 >> cluster = ocfs2 >> >> node: >> ip_port = 7777 >> ip_address = 10.111.10.112 >> number = 4 >> name = xen-blade12 >> cluster = ocfs2 >> >> cluster: >> node_count = 5 >> name = ocfs2 >> >> >> After adding the node, on every cluster.conf I can see that this was added: >> >> node: >> ip_port = 7777 >> ip_address = 10.111.10.111 >> number = 5 >> name = xen-blade11 >> cluster = ocfs2 >> >> cluster: >> node_count = 6 >> name = ocfs2 >> >> EXCEPT on xen-blade16 >> >> It added like this: >> >> node: >> ip_port = 7777 >> ip_address = 10.111.10.111 >> number = 6 >> name = xen-blade11 >> cluster = ocfs2 >> >> cluster: >> node_count = 6 >> name = ocfs2 >> >> (Notice the number = 6 instead of number = 5) >> >> So now when i'm trying to connect the xen-blade11 every host accept the >> connection except the xen-blade16, and the cluster joining is being >> rejected. >> >> as we can see in the kernel messages on xen-blade11 >> >> [ 1852.729539] o2net: Connection to node xen-blade16 (num 0) at >> 10.111.10.116:7777 shutdown, state 7 >> [ 1852.729892] o2net: Connected to node xen-blade12 (num 4) at >> 10.111.10.112:7777 >> [ 1852.737122] o2net: Connected to node xen-blade14 (num 2) at >> 10.111.10.114:7777 >> [ 1852.741408] o2net: Connected to node xen-blade15 (num 1) at >> 10.111.10.115:7777 >> [ 1854.733759] o2net: Connection to node xen-blade16 (num 0) at >> 10.111.10.116:7777 shutdown, state 7 >> [ 1856.737129] o2net: Connection to node xen-blade16 (num 0) at >> 10.111.10.116:7777 shutdown, state 7 >> [ 1856.764520] OCFS2 1.5.0 >> [ 1858.740877] o2net: Connection to node xen-blade16 (num 0) at >> 10.111.10.116:7777 shutdown, state 7 >> [ 1860.744847] o2net: Connection to node xen-blade16 (num 0) at >> 10.111.10.116:7777 shutdown, state 7 >> [ 1862.748919] o2net: Connection to node xen-blade16 (num 0) at >> 10.111.10.116:7777 shutdown, state 7 >> [ 1864.752929] o2net: Connection to node xen-blade16 (num 0) at >> 10.111.10.116:7777 shutdown, state 7 >> [ 1866.756825] o2net: Connection to node xen-blade16 (num 0) at >> 10.111.10.116:7777 shutdown, state 7 >> [ 1868.760809] o2net: Connection to node xen-blade16 (num 0) at >> 10.111.10.116:7777 shutdown, state 7 >> [ 1870.764937] o2net: Connection to node xen-blade16 (num 0) at >> 10.111.10.116:7777 shutdown, state 7 >> [ 1872.768905] o2net: Connection to node xen-blade16 (num 0) at >> 10.111.10.116:7777 shutdown, state 7 >> [ 1874.772947] o2net: Connection to node xen-blade16 (num 0) at >> 10.111.10.116:7777 shutdown, state 7 >> [ 1876.776928] o2net: Connection to node xen-blade16 (num 0) at >> 10.111.10.116:7777 shutdown, state 7 >> [ 1878.780828] o2net: Connection to node xen-blade16 (num 0) at >> 10.111.10.116:7777 shutdown, state 7 >> [ 1880.784974] o2net: Connection to node xen-blade16 (num 0) at >> 10.111.10.116:7777 shutdown, state 7 >> [ 1882.784529] o2net: No connection established with node 0 after 30.0 >> seconds, giving up. >> [ 1912.864531] o2net: No connection established with node 0 after 30.0 >> seconds, giving up. >> [ 1917.028531] o2cb: This node could not connect to nodes: 0. >> [ 1917.028684] o2cb: Cluster check failed. Fix errors before retrying. >> [ 1917.028758] (mount.ocfs2,4238,4):ocfs2_dlm_init:3001 ERROR: status = -107 >> [ 1917.028880] (mount.ocfs2,4238,4):ocfs2_mount_volume:1879 ERROR: >> status = -107 >> [ 1917.029005] ocfs2: Unmounting device (254,5) on (node 0) >> [ 1917.029022] (mount.ocfs2,4238,4):ocfs2_fill_super:1234 ERROR: status >> = -107 >> [ 1918.860551] o2net: No longer connected to node xen-blade15 (num 1) at >> 10.111.10.115:7777 >> [ 1918.860599] o2net: No longer connected to node xen-blade14 (num 2) at >> 10.111.10.114:7777 >> [ 1918.860636] o2net: No longer connected to node xen-blade12 (num 4) at >> 10.111.10.112:7777 >> >> Okay so far, I thought I would try to remove that node from xen-blade16 >> and re-add it again, but... >> >> [root@xen-blade16 ~]# o2cb_ctl -D -n xen-blade11 >> o2cb_ctl: Not yet supported >> >> (Not yet supported, how long is "yet"?) >> >> Please, tell me that there is a way to clean this so I can attach >> xen-blade11 to the cluster? >> I mean Isn't OCFS2 is supposed to be a production ready filesystem, >> meaning that you can add/remove >> nodes without having to shut down the cluster ? >> >> I can't do that, it's in production and I can't even consider shutting >> down the single node xen-blade16 >> That would need me to migrate virtual machines (taking almost 64GB of >> ram of that server) on another server in the cluster, but we have no >> free server (that's why i'm adding xen-blade11 to the cluster...). >> >> I mean even adding a new server with another name will lead to the same >> problem, on every node it will add it as node number 6 but it will be >> node number 7 on the xen-blade16... Same problem again... >> >> Please help :) >> >> Thanks for reading me. >> >> Cheers, >> Sébastien >> >> >> >> _______________________________________________ >> Ocfs2-users mailing list >> Ocfs2-users@oss.oracle.com >> http://oss.oracle.com/mailman/listinfo/ocfs2-users _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users