[Gluster-users] Issues removing then adding a brick to a replica volume (Gluster 3.7.6)

Lindsay Mathieson Sun, 17 Jan 2016 21:50:09 -0800

Been running through my eternal testing regime ... and experimentingwith removing/adding bricks - to me, a necessary part of volumemaintenance for dealing with failed disks. The datastore is a VM hostand all the following is done live. Sharding is active with a 512MBshard size.


So I started off with a replica 3 volume


   // recreated from memory
   Volume Name: datastore1
   Type: Replicate
   Volume ID: bf882533-f1a9-40bf-a13e-d26d934bfa8b
   Status: Started
   Number of Bricks: 1 x 3 = 3
   Transport-type: tcp
   Bricks:
   Brick1: vnb.proxmox.softlog:/vmdata/datastore1
   Brick2: vng.proxmox.softlog:/vmdata/datastore1
   Brick3: vna.proxmox.softlog:/vmdata/datastore1



I remove a brick with:

gluster volume remove-brick datastore1 replica 2vng.proxmox.softlog:/vmdata/datastore1 force


so we end up with:

   Volume Name: datastore1
   Type: Replicate
   Volume ID: bf882533-f1a9-40bf-a13e-d26d934bfa8b
   Status: Started
   Number of Bricks: 1 x 2 = 2
   Transport-type: tcp
   Bricks:
   Brick1: vna.proxmox.softlog:/vmdata/datastore1
   Brick2: vnb.proxmox.softlog:/vmdata/datastore1



All well and good. No heal issues, VM's running ok.

Then I clean the brick off the vng host:

rm -rf /vmdata/datastore1


I then add the brick back with:

   gluster volume add-brick datastore1 replica 3
   vng.proxmox.softlog:/vmdata/datastore1

   Volume Name: datastore1
   Type: Replicate
   Volume ID: bf882533-f1a9-40bf-a13e-d26d934bfa8b
   Status: Started
   Number of Bricks: 1 x 3 = 3
   Transport-type: tcp
   Bricks:
   Brick1: vna.proxmox.softlog:/vmdata/datastore1
   Brick2: vnb.proxmox.softlog:/vmdata/datastore1
   Brick3: vng.proxmox.softlog:/vmdata/datastore1

This recreates the brick directory "datastore1". Unfortunately this iswhere things start to go wrong :( Heal info:


   gluster volume heal datastore1 info
   Brick vna.proxmox.softlog:/vmdata/datastore1
   /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.57
   /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.5
   Number of entries: 2

   Brick vnb.proxmox.softlog:/vmdata/datastore1
   /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.5
   /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.57
   Number of entries: 2

   Brick vng.proxmox.softlog:/vmdata/datastore1
   /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.1
   /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.6
   /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.15
   /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.18
   /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.5

Its my understanding that there shouldn't be any heal entries on vng asit that is where all the shards should be sent *to*

also running qemu-img check on the hosted VM images results in a I/Oerror. Eventually the VM's themselves crash - I suspect this is due toindividual shards being unreadable.

Another odd behaviour I get is if I run a full heal on vnb I get thefollowing error:


   Launching heal operation to perform full self heal on volume
   datastore1 has been unsuccessful


However if I run it on VNA, it succeeds.

Lastly - if I remove the brick everythign returns to normal immediately.Heal Info shows no issues and qemu-img check returns no errors.





--
Lindsay Mathieson

_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Issues removing then adding a brick to a replica volume (Gluster 3.7.6)

Reply via email to