Re: [Gluster-users] Strange behaviour with add-brick followed by remove-brick

Brian Cipriano Wed, 30 Oct 2013 08:41:56 -0700

I had the exact same experience recently with a 3.4 distributed cluster I set 
up. I spent some time on the IRC but couldn’t track it down. Seems remove-brick 
is broken in 3.3 and 3.4. I guess folks don’t remove bricks very often :)


- brian




On Oct 30, 2013, at 11:21 AM, Lalatendu Mohanty <[email protected]> wrote:

> On 10/30/2013 08:40 PM, Lalatendu Mohanty wrote:
>> On 10/30/2013 03:43 PM, B.K.Raghuram wrote:
>>> I have gluster 3.4.1 on 4 boxes with hostnames n9, n10, n11, n12. I
>>> did the following sequence of steps and ended up with losing data so
>>> what did I do wrong?!
>>> 
>>> - Create a distributed volume with bricks on n9 and n10
>>> - Started the volume
>>> - NFS mounted the volume and created 100 files on it. Found that n9
>>> had 45, n10 had 55
>>> - Added a brick n11 to this volume
>>> - Removed a brick n10 from the volume with gluster remove brick <vol>
>>> <n10 brick name> start
>>> - n9 now has 45 files, n10 has 55 files and n11 has 45 files(all the
>>> same as on n9)
>>> - Checked status, it shows that no rebalanced files but that n10 had
>>> scanned 100 files and completed. 0 scanned for all the others
>>> - I then did a rebalance start force on the vol and found that n9 had
>>> 0 files, n10 had 55 files and n11 had 45 files - weird - looked like
>>> n9 had been removed but double checked again and found that n10 had
>>> indeed been removed.
>>> - did a remove-brick commit. Now same file distribution after that.
>>> volume info now shows the volume to have n9 and n11 and bricks.
>>> - did a rebalance start again on the volume. The rebalance-status now
>>> shows n11 had 45 rebalanced files, all the brick nodes had 45 files
>>> scanned and all show complete. The file layout after this is n9 has 45
>>> files and n10 has 55 files. n11 has 0 files!
>>> - An ls on the nfs mount now shows only 45 files so the other 55 not
>>> visible because they are on n10 which is not part of the volume!
>>> 
>>> What have I done wrong in this sequence?
>>> _______________________________________________
>>> Gluster-users mailing list
>>> [email protected]
>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>> 
>> I think running rebalnce (force) in between "remove brick start" and "remove 
>> brick commit" is the issue. Can you please paste your command as per the 
>> time line of events. That would make it more clear. 
>> 
>> Below are the steps, I do to replace a brick and it works for me. 
>> 
>> gluster volume add-brick VOLNAME NEW-BRICK
>> gluster volume remove-brick VOLNAME BRICK start
>> gluster volume remove-brick VOLNAME BRICK status
>> gluster volume remove-brick VOLNAME BRICK commit
> I will also suggest you to use distribute-replicate volumes, so that you have 
> a replica copy always and it reduces the probability of losing data.
> 
> -Lala 
> 
> _______________________________________________
> Gluster-users mailing list
> [email protected]
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Strange behaviour with add-brick followed by remove-brick

Reply via email to