Hi Sunil,

Thanks for your reply and help. Ok I'm doing this on XenSever/XCP and 
the kernel is a 2.6.32

Linux xen-blade16 2.6.32.12-0.7.1.xs1.1.0.327.170596xen #1 SMP Wed Nov 9 
12:50:53 CET 2011 i686 i686 i386 GNU/Linux

and the tools for centos 5.x (xcp is based on this)

[root@xen-blade16 ~]# o2cb_ctl -V
o2cb_ctl version 1.4.4

That would explain :)

Pardon my ignorance, I think I'm stuck and that I will need to rebuild a 
new kernel unless updating the tools is enough to be able to remove a 
node...

Thanks again,
Sébastien






On 30.04.2012 03:54, Sunil Mushran wrote:
> Online add/remove of nodes and of global heartbeat devices has been in 
> mainline for over a year. I think 2.6.38+ and tools 1.8. The ocfs2-tools tree 
> hosted on oss.oracle.com/git has a 1.8.2 tag that can be used safely. It has 
> been fully tested. The user's guide has been moved to man pages bundled with 
> the tools. Do man ocfs2 after building and installing the tools.
>
> On Apr 29, 2012, at 1:21 PM, Sébastien Riccio<s...@swisscenter.com>  wrote:
>
>> Hi dear list,
>>
>> I think the subjet might already been discussed, but I can only found
>> old threads about removing a node from the cluster.
>>
>> I was hoping that in 2012 it would be possible to dynamically add/remove
>> nodes from a shared filesystem but this evening I had this problem:
>>
>> I wanted to add a node to our ocfs2 cluster, node named xen-blade11 with
>> ip 10.111.10.111
>>
>> So on every other node I ran this command:
>>
>> o2cb_ctl -C -i -n xen-blade11 -t node -a number=5 -a
>> ip_address=10.111.10.111 -a ip_port=7777 -a cluster=ocfs2
>>
>> Which successfully added the node to every cluster node, except on
>> xen-server16
>>
>> On every node the original cluster.conf was:
>>
>> node:
>>          ip_port = 7777
>>          ip_address = 10.111.10.116
>>          number = 0
>>          name = xen-blade16
>>          cluster = ocfs2
>>
>> node:
>>          ip_port = 7777
>>          ip_address = 10.111.10.115
>>          number = 1
>>          name = xen-blade15
>>          cluster = ocfs2
>>
>> node:
>>          ip_port = 7777
>>          ip_address = 10.111.10.114
>>          number = 2
>>          name = xen-blade14
>>          cluster = ocfs2
>>
>> node:
>>          ip_port = 7777
>>          ip_address = 10.111.10.113
>>          number = 3
>>          name = xen-blade13
>>          cluster = ocfs2
>>
>> node:
>>          ip_port = 7777
>>          ip_address = 10.111.10.112
>>          number = 4
>>          name = xen-blade12
>>          cluster = ocfs2
>>
>> cluster:
>>          node_count = 5
>>          name = ocfs2
>>
>>
>> After adding the node, on every cluster.conf I can see that this was added:
>>
>> node:
>>          ip_port = 7777
>>          ip_address = 10.111.10.111
>>          number = 5
>>          name = xen-blade11
>>          cluster = ocfs2
>>
>> cluster:
>>          node_count = 6
>>          name = ocfs2
>>
>> EXCEPT on xen-blade16
>>
>> It added like this:
>>
>> node:
>>          ip_port = 7777
>>          ip_address = 10.111.10.111
>>          number = 6
>>          name = xen-blade11
>>          cluster = ocfs2
>>
>> cluster:
>>          node_count = 6
>>          name = ocfs2
>>
>> (Notice the number = 6 instead of number = 5)
>>
>> So now when i'm trying to connect the xen-blade11 every host accept the
>> connection except the xen-blade16, and the cluster joining is being
>> rejected.
>>
>> as we can see in the kernel messages on xen-blade11
>>
>> [ 1852.729539] o2net: Connection to node xen-blade16 (num 0) at
>> 10.111.10.116:7777 shutdown, state 7
>> [ 1852.729892] o2net: Connected to node xen-blade12 (num 4) at
>> 10.111.10.112:7777
>> [ 1852.737122] o2net: Connected to node xen-blade14 (num 2) at
>> 10.111.10.114:7777
>> [ 1852.741408] o2net: Connected to node xen-blade15 (num 1) at
>> 10.111.10.115:7777
>> [ 1854.733759] o2net: Connection to node xen-blade16 (num 0) at
>> 10.111.10.116:7777 shutdown, state 7
>> [ 1856.737129] o2net: Connection to node xen-blade16 (num 0) at
>> 10.111.10.116:7777 shutdown, state 7
>> [ 1856.764520] OCFS2 1.5.0
>> [ 1858.740877] o2net: Connection to node xen-blade16 (num 0) at
>> 10.111.10.116:7777 shutdown, state 7
>> [ 1860.744847] o2net: Connection to node xen-blade16 (num 0) at
>> 10.111.10.116:7777 shutdown, state 7
>> [ 1862.748919] o2net: Connection to node xen-blade16 (num 0) at
>> 10.111.10.116:7777 shutdown, state 7
>> [ 1864.752929] o2net: Connection to node xen-blade16 (num 0) at
>> 10.111.10.116:7777 shutdown, state 7
>> [ 1866.756825] o2net: Connection to node xen-blade16 (num 0) at
>> 10.111.10.116:7777 shutdown, state 7
>> [ 1868.760809] o2net: Connection to node xen-blade16 (num 0) at
>> 10.111.10.116:7777 shutdown, state 7
>> [ 1870.764937] o2net: Connection to node xen-blade16 (num 0) at
>> 10.111.10.116:7777 shutdown, state 7
>> [ 1872.768905] o2net: Connection to node xen-blade16 (num 0) at
>> 10.111.10.116:7777 shutdown, state 7
>> [ 1874.772947] o2net: Connection to node xen-blade16 (num 0) at
>> 10.111.10.116:7777 shutdown, state 7
>> [ 1876.776928] o2net: Connection to node xen-blade16 (num 0) at
>> 10.111.10.116:7777 shutdown, state 7
>> [ 1878.780828] o2net: Connection to node xen-blade16 (num 0) at
>> 10.111.10.116:7777 shutdown, state 7
>> [ 1880.784974] o2net: Connection to node xen-blade16 (num 0) at
>> 10.111.10.116:7777 shutdown, state 7
>> [ 1882.784529] o2net: No connection established with node 0 after 30.0
>> seconds, giving up.
>> [ 1912.864531] o2net: No connection established with node 0 after 30.0
>> seconds, giving up.
>> [ 1917.028531] o2cb: This node could not connect to nodes: 0.
>> [ 1917.028684] o2cb: Cluster check failed. Fix errors before retrying.
>> [ 1917.028758] (mount.ocfs2,4238,4):ocfs2_dlm_init:3001 ERROR: status = -107
>> [ 1917.028880] (mount.ocfs2,4238,4):ocfs2_mount_volume:1879 ERROR:
>> status = -107
>> [ 1917.029005] ocfs2: Unmounting device (254,5) on (node 0)
>> [ 1917.029022] (mount.ocfs2,4238,4):ocfs2_fill_super:1234 ERROR: status
>> = -107
>> [ 1918.860551] o2net: No longer connected to node xen-blade15 (num 1) at
>> 10.111.10.115:7777
>> [ 1918.860599] o2net: No longer connected to node xen-blade14 (num 2) at
>> 10.111.10.114:7777
>> [ 1918.860636] o2net: No longer connected to node xen-blade12 (num 4) at
>> 10.111.10.112:7777
>>
>> Okay so far, I thought I would try to remove that node from xen-blade16
>> and re-add it again, but...
>>
>> [root@xen-blade16 ~]# o2cb_ctl -D -n xen-blade11
>> o2cb_ctl: Not yet supported
>>
>> (Not yet supported, how long is "yet"?)
>>
>> Please, tell me that there is a way to clean this so I can attach
>> xen-blade11 to the cluster?
>> I mean Isn't OCFS2 is supposed to be a production ready filesystem,
>> meaning that you can add/remove
>> nodes without having to shut down the cluster ?
>>
>> I can't do that, it's in production and I can't even consider shutting
>> down the single node xen-blade16
>> That would need me to migrate virtual machines (taking almost 64GB of
>> ram of that server) on another server in the cluster, but we have no
>> free server (that's why i'm adding xen-blade11 to the cluster...).
>>
>> I mean even adding a new server with another name will lead to the same
>> problem, on every node it will add it as node number 6 but it will be
>> node number 7 on the xen-blade16... Same problem again...
>>
>> Please help :)
>>
>> Thanks for reading me.
>>
>> Cheers,
>> Sébastien
>>
>>
>>
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users@oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users


_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Reply via email to