[ha-clusters-discuss] Details: Not possible to remove a dead node (which can't join the cluster) from a cluster ?!?

Nils Goroll Wed, 14 Oct 2009 15:19:23 +0200

Hi Thorsten,


>>>> ## node0 booted outside cluster (-x)
>>> Why are you booting the node out of the cluster?
>>
>> I am trying to work out a procedure to restore a failed cluster node 
>> on different hardware, in which case I cannot assume that the 
>> interconnect will come up as the CLI interfaces might have changed.
> 
> Now I am confused. So let me add some more context and see if this is 
> what you are doing.
> 
> The starting point is a working two node cluster (lets call them node-a 
> and node-b).
> 
> A diskset gets configured for both nodes.
> 
> One node fails and is no longer available. Lets assume this is node-b.
> 
> You should still be able to boot node-a in cluster mode

Correct.

 > If you then determine node-b to be non repairable/restorable, you should
 > be able to remove node-b from the diskset by using:
 >
 > root at node-a# metaset -s <disksetname> -df -h node-b

which is exactly what I am trying to do. In the case I posted, the failed node 
is node0 and I am trying to run on node1 (booted in cluster):

root at pub2-node1:~# time metaset -s pub2-node0 -d -f -h pub2-node0
Proxy command to: pub2-node0
172.16.4.1: RPC: Rpcbind failure - RPC: Timed out
rpc failure
real    1m0.110s
user    0m0.068s
sys     0m0.026s
root at pub2-node1:~# metaset

Set name = nfs-dg, Set number = 1

Host                Owner
   pub2-node0
   pub2-node1         Yes

Driv Dbase

d1   Yes

d2   Yes

I've tried the same on s10 with sc32 and didn't succeed either.

Regarding on side aspect:

> Thus I am not sure what kind of interconnect or CLI interface issues you 
> expect.

I was referring to the fact that in the recovery scenario I am trying to solve 
it might not be possible to form a cluster because the failed node possibly 
could get restored on different hardware, so the (restored) cluster config 
would 
still contain adapters which don't exist on the (changed) hardware.

> I would assume that you need to remove the node from other things like 
> resource groups, quorum device, etc, before you actually perform the 
> "clnode clear -F node-b" from node-a (again being in cluster mode).
> 
> "clnode remove" would only be used if the node you want to remove is 
> still bootable into non-cluster mode.

Please let me come back to this point later, I currently can't access my 
development environment :-(

> Or are you trying to remove a dead node, and then later add a different 
> new node?

This is the plan. For purpose of development environment, the removed and added 
node are the same, but this is just a simplification.


Again, thank you very much for taking the time to discuss these issues.

Nils

[ha-clusters-discuss] Details: Not possible to remove a dead node (which can't join the cluster) from a cluster ?!?

Reply via email to