I had the exact same experience recently with a 3.4 distributed cluster I set up. I spent some time on the IRC but couldn’t track it down. Seems remove-brick is broken in 3.3 and 3.4. I guess folks don’t remove bricks very often :)
- brian On Oct 30, 2013, at 11:21 AM, Lalatendu Mohanty <[email protected]> wrote: > On 10/30/2013 08:40 PM, Lalatendu Mohanty wrote: >> On 10/30/2013 03:43 PM, B.K.Raghuram wrote: >>> I have gluster 3.4.1 on 4 boxes with hostnames n9, n10, n11, n12. I >>> did the following sequence of steps and ended up with losing data so >>> what did I do wrong?! >>> >>> - Create a distributed volume with bricks on n9 and n10 >>> - Started the volume >>> - NFS mounted the volume and created 100 files on it. Found that n9 >>> had 45, n10 had 55 >>> - Added a brick n11 to this volume >>> - Removed a brick n10 from the volume with gluster remove brick <vol> >>> <n10 brick name> start >>> - n9 now has 45 files, n10 has 55 files and n11 has 45 files(all the >>> same as on n9) >>> - Checked status, it shows that no rebalanced files but that n10 had >>> scanned 100 files and completed. 0 scanned for all the others >>> - I then did a rebalance start force on the vol and found that n9 had >>> 0 files, n10 had 55 files and n11 had 45 files - weird - looked like >>> n9 had been removed but double checked again and found that n10 had >>> indeed been removed. >>> - did a remove-brick commit. Now same file distribution after that. >>> volume info now shows the volume to have n9 and n11 and bricks. >>> - did a rebalance start again on the volume. The rebalance-status now >>> shows n11 had 45 rebalanced files, all the brick nodes had 45 files >>> scanned and all show complete. The file layout after this is n9 has 45 >>> files and n10 has 55 files. n11 has 0 files! >>> - An ls on the nfs mount now shows only 45 files so the other 55 not >>> visible because they are on n10 which is not part of the volume! >>> >>> What have I done wrong in this sequence? >>> _______________________________________________ >>> Gluster-users mailing list >>> [email protected] >>> http://supercolony.gluster.org/mailman/listinfo/gluster-users >> >> I think running rebalnce (force) in between "remove brick start" and "remove >> brick commit" is the issue. Can you please paste your command as per the >> time line of events. That would make it more clear. >> >> Below are the steps, I do to replace a brick and it works for me. >> >> gluster volume add-brick VOLNAME NEW-BRICK >> gluster volume remove-brick VOLNAME BRICK start >> gluster volume remove-brick VOLNAME BRICK status >> gluster volume remove-brick VOLNAME BRICK commit > I will also suggest you to use distribute-replicate volumes, so that you have > a replica copy always and it reduces the probability of losing data. > > -Lala > > _______________________________________________ > Gluster-users mailing list > [email protected] > http://supercolony.gluster.org/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list [email protected] http://supercolony.gluster.org/mailman/listinfo/gluster-users
