[Gluster-users] self-heal trouble after changing arbiter brick

Seva Gluschenko Wed, 07 Feb 2018 23:25:24 -0800

Hi folks,

I'm troubled moving an arbiter brick to another server because of I/O load 
issues. My setup is as follows:


# gluster volume info
Volume Name: myvol
Type: Distributed-Replicate
Volume ID: 43ba517a-ac09-461e-99da-a197759a7dc8
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x (2 + 1) = 9
Transport-type: tcp
Bricks:
Brick1: gv0:/data/glusterfs
Brick2: gv1:/data/glusterfs
Brick3: gv4:/data/gv01-arbiter (arbiter)
Brick4: gv2:/data/glusterfs
Brick5: gv3:/data/glusterfs
Brick6: gv1:/data/gv23-arbiter (arbiter)
Brick7: gv4:/data/glusterfs
Brick8: gv5:/data/glusterfs
Brick9: pluto:/var/gv45-arbiter (arbiter)
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
storage.owner-gid: 1000
storage.owner-uid: 1000
cluster.self-heal-daemon: enable

The gv23-arbiter is the brick that was recently moved from other server 
(chronos) using the following command:

# gluster volume replace-brick myvol chronos:/mnt/gv23-arbiter 
gv1:/data/gv23-arbiter commit force
volume replace-brick: success: replace-brick commit force operation successful

It's not the first time I was moving an arbiter brick, and the heal-count was 
zero for all the bricks before the change, so I didn't expect much trouble 
then. What was probably wrong is that I then forced chronos out of cluster with 
gluster peer detach command. All since that, over the course of the last 3 
days, I see this:

# gluster volume heal myvol statistics heal-count
Gathering count of entries to be healed on volume myvol has been successful

Brick gv0:/data/glusterfs
Number of entries: 0

Brick gv1:/data/glusterfs
Number of entries: 0

Brick gv4:/data/gv01-arbiter
Number of entries: 0

Brick gv2:/data/glusterfs
Number of entries: 64999

Brick gv3:/data/glusterfs
Number of entries: 64999

Brick gv1:/data/gv23-arbiter
Number of entries: 0

Brick gv4:/data/glusterfs
Number of entries: 0

Brick gv5:/data/glusterfs
Number of entries: 0

Brick pluto:/var/gv45-arbiter
Number of entries: 0

According to the /var/log/glusterfs/glustershd.log, the self-healing is 
undergoing, so it might be worth just sit and wait, but I'm wondering why this 
64999 heal-count persists (a limitation on counter? In fact, gv2 and gv3 bricks 
contain roughly 30 million files), and I feel bothered because of the following 
output:

# gluster volume heal myvol info heal-failed
Gathering list of heal failed entries on volume myvol has been unsuccessful on 
bricks that are down. Please check if all brick processes are running.

I attached the chronos server back to the cluster, with no noticeable effect. 
Any comments and suggestions would be much appreciated.

--
Best Regards,

Seva Gluschenko
CTO @ http://webkontrol.ru (http://webkontrol.ru/)

_______________________________________________
Gluster-users mailing list
[email protected]
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] self-heal trouble after changing arbiter brick

Reply via email to