Re: [Gluster-users] File Corruption when adding bricks to live replica volumes

Krutika Dhananjay Tue, 19 Jan 2016 04:07:33 -0800

Hi Lindsay, 

Just to be sure we are not missing any steps here, you did invoke 'gluster 
volume heal datastore1 full' after adding the third brick, before the heal 
could begin, right?


As far as the reverse heal is concerned, there is one issue with add-brick 
where replica count is increased, which is still under review. 
Could you instead try the following steps at the time of add-brick and tell me 
if it works fine: 

1. Run 'gluster volume add-brick datastore1 replica 3 
vng.proxmox.softlog:/vmdata/datastore1' as usual. 

2. Kill the glusterfsd process corresponding to newly added brick (the brick in 
vng in your case). You should be able to get its pid in the output of 'gluster 
volume status datastore1'. 
3. Create a dummy file on the root of the volume from the mount point. This can 
be any random name. 
4. Delete the dummy file created in step 3. 
5. Bring the killed brick back up. For this, you can run 'gluster volume start 
datastore1 force'. 
6. Then execute 'gluster volume heal datastore1 full' on the node with the 
highest uuid (this we know how to do from the previous thread on the same 
topic). 

Then monitor heal-info output to track heal progress. 
Let me know if this works. 

-Krutika 

----- Original Message -----

> From: "Lindsay Mathieson" <[email protected]>
> To: "gluster-users" <[email protected]>
> Sent: Tuesday, January 19, 2016 4:54:07 PM
> Subject: [Gluster-users] File Corruption when adding bricks to live replica
> volumes

> gluster 3.7.6

> I seem to be able to reliably reproduce this. I have a replica 2 volume with
> 1 test VM image. While the VM is running with heavy disk read/writes (disk
> benchmark) I add a 3rd brick for replica 3:

> gluster volume add-brick datastore1 replica 3
> vng.proxmox.softlog:/vmdata/datastore1

> I pretty much immediately get this:

> > gluster volume heal datastore1 info
> 
> > Brick vna.proxmox.softlog:/vmdata/datastore1
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.20
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.22
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.55 - Possibly undergoing heal
> 

> > /images/301/vm-301-disk-1.qcow2 - Possibly undergoing heal
> 

> > Number of entries: 4
> 

> > Brick vnb.proxmox.softlog:/vmdata/datastore1
> 
> > /images/301/vm-301-disk-1.qcow2 - Possibly undergoing heal
> 

> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.55 - Possibly undergoing heal
> 

> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.20
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.22
> 
> > Number of entries: 4
> 

> > Brick vng.proxmox.softlog:/vmdata/datastore1
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.16
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.28
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.1
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.22
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.77
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.9
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.5
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.2
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.26
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.15
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.13
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.3
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.18
> 
> > Number of entries: 13
> 

> The brick on vng is the new empty brick, but it has 13 shards being healed
> back to vna & vnb. That can't be right and if I leave it the VM becomes
> hopelessly corrupted. Also there are 81 shards in the files, they should all
> be queued for healing.

> Additionally I get read errors when I run a qemu-img check on the VM image.
> If I remove the vng brick the problems are resolved.

> If I do the same process while the VM is not running - i.e no files are being
> access, every proceeds as expect. All shard on vn & vnb are healed to vng,

> --
> Lindsay Mathieson

> _______________________________________________
> Gluster-users mailing list
> [email protected]
> http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] File Corruption when adding bricks to live replica volumes

Reply via email to