Hi folks, I'm troubled moving an arbiter brick to another server because of I/O load issues. My setup is as follows:
# gluster volume info Volume Name: myvol Type: Distributed-Replicate Volume ID: 43ba517a-ac09-461e-99da-a197759a7dc8 Status: Started Snapshot Count: 0 Number of Bricks: 3 x (2 + 1) = 9 Transport-type: tcp Bricks: Brick1: gv0:/data/glusterfs Brick2: gv1:/data/glusterfs Brick3: gv4:/data/gv01-arbiter (arbiter) Brick4: gv2:/data/glusterfs Brick5: gv3:/data/glusterfs Brick6: gv1:/data/gv23-arbiter (arbiter) Brick7: gv4:/data/glusterfs Brick8: gv5:/data/glusterfs Brick9: pluto:/var/gv45-arbiter (arbiter) Options Reconfigured: nfs.disable: on transport.address-family: inet storage.owner-gid: 1000 storage.owner-uid: 1000 cluster.self-heal-daemon: enable The gv23-arbiter is the brick that was recently moved from other server (chronos) using the following command: # gluster volume replace-brick myvol chronos:/mnt/gv23-arbiter gv1:/data/gv23-arbiter commit force volume replace-brick: success: replace-brick commit force operation successful It's not the first time I was moving an arbiter brick, and the heal-count was zero for all the bricks before the change, so I didn't expect much trouble then. What was probably wrong is that I then forced chronos out of cluster with gluster peer detach command. All since that, over the course of the last 3 days, I see this: # gluster volume heal myvol statistics heal-count Gathering count of entries to be healed on volume myvol has been successful Brick gv0:/data/glusterfs Number of entries: 0 Brick gv1:/data/glusterfs Number of entries: 0 Brick gv4:/data/gv01-arbiter Number of entries: 0 Brick gv2:/data/glusterfs Number of entries: 64999 Brick gv3:/data/glusterfs Number of entries: 64999 Brick gv1:/data/gv23-arbiter Number of entries: 0 Brick gv4:/data/glusterfs Number of entries: 0 Brick gv5:/data/glusterfs Number of entries: 0 Brick pluto:/var/gv45-arbiter Number of entries: 0 According to the /var/log/glusterfs/glustershd.log, the self-healing is undergoing, so it might be worth just sit and wait, but I'm wondering why this 64999 heal-count persists (a limitation on counter? In fact, gv2 and gv3 bricks contain roughly 30 million files), and I feel bothered because of the following output: # gluster volume heal myvol info heal-failed Gathering list of heal failed entries on volume myvol has been unsuccessful on bricks that are down. Please check if all brick processes are running. I attached the chronos server back to the cluster, with no noticeable effect. Any comments and suggestions would be much appreciated. -- Best Regards, Seva Gluschenko CTO @ http://webkontrol.ru (http://webkontrol.ru/)
_______________________________________________ Gluster-users mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-users
