On 09/05/2013 02:16 AM, Anup Nair wrote:
On Thu, Sep 5, 2013 at 12:41 AM, Vijay Bellur <[email protected]
<mailto:[email protected]>> wrote:
On 09/03/2013 01:18 PM, Anup Nair wrote:
Glusterfs version 3.2.2
I have a Gluster volume in which one our of the 4 peers/nodes had
crashed some time ago, prior to my joining service here.
I see from volume info that the crashed (non-existing) node is
still
listed as one of the peers and the bricks are also listed. I
would like
to detach this node and its bricks and rebalance the volume with
remaining 3 peers. But I am unable to do so. Here are my setps:
1. #gluster peer status
Number of Peers: 3 -- (note: excluding the one I run this
command from)
Hostname: dbstore4r294 --- (note: node/peer that is down)
Uuid: 8bf13458-1222-452c-81d3-565a563d768a
State: Peer in Cluster (Disconnected)
Hostname: 172.16.1.90
Uuid: 77ebd7e4-7960-4442-a4a4-00c5b99a61b4
State: Peer in Cluster (Connected)
Hostname: dbstore3r294
Uuid: 23d7a18c-fe57-47a0-afbc-1e1a5305c0eb
State: Peer in Cluster (Connected)
2. #gluster peer detach dbstore4r294
Brick(s) with the peer dbstore4r294 exist in cluster
3. #gluster volume info
Volume Name: test-volume
Type: Distributed-Replicate
Status: Started
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: dbstore1r293:/datastore1
Brick2: dbstore2r293:/datastore1
Brick3: dbstore3r294:/datastore1
Brick4: dbstore4r294:/datastore1
Brick5: dbstore1r293:/datastore2
Brick6: dbstore2r293:/datastore2
Brick7: dbstore3r294:/datastore2
Brick8: dbstore4r294:/datastore2
Options Reconfigured:
network.ping-timeout: 42s
performance.cache-size: 64MB
performance.write-behind-window-size: 3MB
performance.io-thread-count: 8
performance.cache-refresh-timeout: 2
Note that the non-existent node/peer is -- dbstore4r294
(bricks are
:/datastore1 & /datastore2 - i.e. brick4 and brick8)
4. #gluster volume remove-brick test-volume
dbstore4r294:/datastore1
Removing brick(s) can result in data loss. Do you want to
Continue?
(y/n) y
Remove brick incorrect brick count of 1 for replica 2
5. #gluster volume remove-brick test-volume
dbstore4r294:/datastore1
dbstore4r294:/datastore2
Removing brick(s) can result in data loss. Do you want to
Continue?
(y/n) y
Bricks not from same subvol for replica
How do I remove the peer? What are the steps considering that
the node
is non-existent?
*/
Do you plan to replace the dead server with a new server? If so,
this could be a possible sequence of steps:
No. We are not going to replace it. So, I need to resize it to a 3
node cluster.
I discovered the issue when one of the node hung and I had to reboot
it. I expected Gluster volume to be available for one node failure.
The volume was non-responsive.
Surprised at that, I checked the details and found it was running with
one node missing for many months now, perhaps an year!
I have no node to replace it with. So, I am looking for a method by
which I can resize it.
The problem is that you want to do a replica 2 volume with an odd number
of servers. This can be done but requires that you think of bricks
individually rather than tying sets of bricks to servers. Your goal is
to simply have each pair of replica bricks on two unique servers.
See
http://joejulian.name/blog/how-to-expand-glusterfs-replicated-clusters-by-one-server/
for an example.
_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users