Re: [Gluster-users] How to remove a dead node and re-balance volume?

Vijay Bellur Wed, 04 Sep 2013 12:13:07 -0700

On 09/03/2013 01:18 PM, Anup Nair wrote:

Glusterfs version 3.2.2


I have a Gluster volume in which one our of the 4 peers/nodes had
crashed some time ago, prior to my joining service here.

I see from volume info that the crashed (non-existing) node is still
listed as one of the peers and the bricks are also listed. I would like
to detach this node and its bricks and rebalance the volume with
remaining 3 peers. But I am unable to do so. Here are my setps:

1. #gluster peer status
   Number of Peers: 3 -- (note: excluding the one I run this command from)

   Hostname: dbstore4r294 --- (note: node/peer that is down)
   Uuid: 8bf13458-1222-452c-81d3-565a563d768a
   State: Peer in Cluster (Disconnected)

   Hostname: 172.16.1.90
   Uuid: 77ebd7e4-7960-4442-a4a4-00c5b99a61b4
   State: Peer in Cluster (Connected)

   Hostname: dbstore3r294
   Uuid: 23d7a18c-fe57-47a0-afbc-1e1a5305c0eb
   State: Peer in Cluster (Connected)

2. #gluster peer detach dbstore4r294
   Brick(s) with the peer dbstore4r294 exist in cluster

3. #gluster volume info

   Volume Name: test-volume
   Type: Distributed-Replicate
   Status: Started
   Number of Bricks: 4 x 2 = 8
   Transport-type: tcp
   Bricks:
   Brick1: dbstore1r293:/datastore1
   Brick2: dbstore2r293:/datastore1
   Brick3: dbstore3r294:/datastore1
   Brick4: dbstore4r294:/datastore1
   Brick5: dbstore1r293:/datastore2
   Brick6: dbstore2r293:/datastore2
   Brick7: dbstore3r294:/datastore2
   Brick8: dbstore4r294:/datastore2
   Options Reconfigured:
   network.ping-timeout: 42s
   performance.cache-size: 64MB
   performance.write-behind-window-size: 3MB
   performance.io-thread-count: 8
   performance.cache-refresh-timeout: 2

Note that the non-existent node/peer is  -- dbstore4r294 (bricks are
:/datastore1 & /datastore2  - i.e.  brick4 and brick8)

4. #gluster volume remove-brick test-volume dbstore4r294:/datastore1
   Removing brick(s) can result in data loss. Do you want to Continue?
(y/n) y
   Remove brick incorrect brick count of 1 for replica 2

5. #gluster volume remove-brick test-volume dbstore4r294:/datastore1
dbstore4r294:/datastore2
   Removing brick(s) can result in data loss. Do you want to Continue?
(y/n) y
   Bricks not from same subvol for replica

How do I remove the peer? What are the steps considering that the node
is non-existent?
*/

Do you plan to replace the dead server with a new server? If so, thiscould be a possible sequence of steps:


1. Peer probe new server and have two bricks commited

2. volume replace-brick <volname> <brick4> <new-brick1> commit force

3. volume replace-brick <volname> <brick8> <new-brick2> commit force

4. peer detach dead server.

5. Since 3.2.2 is being used here, you would need a crawl (find . |xargs stat) to trigger self-healing for the newly added bricks.


-Vijay
_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] How to remove a dead node and re-balance volume?

Reply via email to