I'm gonna stop debugging this as I still cannot figure out how to reproduce this problem for further debug. I did 4~5 rounds of tests (all from scratch) yesterday and today, only met the problem once Monday afternoon, but repeating the steps didn't give me the same result. Also checked log there was nothing wrong except rebalance was happening on the wrong bricks.

I will raise this again if I can have any useful information.

-C.B.

On 8/13/2013 7:00 AM, Cool wrote:
Thanks Ravi, I manged to reproduce the issue for 2 times in the past several days, but without anything significant in log, volume info and after shows correct information (i.e. sdd1 got removed though data was not migrated out), rebalance.log telling it was migrating data out of sdc1, not sdd1.

I'm doing another try now with -L TRACE to see if I can get more log information, this will take some time, will post here if I find anything helpful.

-C.B.
On 8/13/2013 6:49 AM, Ravishankar N wrote:
On 08/13/2013 06:21 PM, Cool wrote:
I'm pretty sure I did "watch ... remove-brick ... status" till it mentioned everything is completed before trigger commit, I should make it clear in my previous mail.

Actually you can read my mail again - in step #5, files on /sdc1 got migrated instead of /sdd1, even though my command was trying to remove-brick /sdd1,
Ah, my bad. Got it now. This is strange..
this is the root cause (to me) that caused the problem, as data on /sdc1 migrated to /sdb1 and /sdd1, then commit simply remove /sdd1 from gfs_v0. It seems vol definition information got some problem in gluster.
If you are able to reproduce the issue, does 'gluster volume info' show the correct bricks before and after start-status-commit operations of removing sdd1? You could also see if there are any error messages in /var/log/glusterfs/<volname>-rebalance.log

-Ravi

-C.B.

On 8/12/2013 9:51 PM, Ravishankar N wrote:
On 08/13/2013 03:43 AM, Cool wrote:
remove-brick in 3.4.0 seems removing wrong bricks, can someone help to review the environment/steps to see if I did anything stupid?

setup - Ubuntu 12.04LTS on gfs11 and gfs12, with following packages from ppa, both nodes have 3 xfs partitions sdb1, sdc1, sdd1: ii glusterfs-client 3.4.0final-ubuntu1~precise1 clustered file-system (client package) ii glusterfs-common 3.4.0final-ubuntu1~precise1 GlusterFS common libraries and translator modules ii glusterfs-server 3.4.0final-ubuntu1~precise1 clustered file-system (server package)

step to reproduce the problem:
1. create volume gfs_v0 in replica 2 with gfs11:/sdb1 and gfs12:/sdb1
2. add-brick gfs11:/sdc1 and gfs12:/sdc1
3. add-brick gfs11:/sdd1 and gfs12:/sdd1
4. rebalance to make files distributed to all three pair of disks
5. remove-brick gfs11:/sdd1 and gfs12:/sdd1 start, files on ***/sdc1*** are migrating out
6. remove-brick commit led to data loss in gfs_v0

If between step 5 and 6 I initiate a remove-brick targeting /sdc1, then after commit I would not lose anything since all data will be migrated back to /sdb1.


You should ensure that a 'remove-brick start ' has completed and then commit it before initiating the second one. The correct way to do this would be: 5. # gluster volume remove-brick gfs_v0 gfs11:/sdd1 gfs12:/sdd1 start 6. Check that the data migration has been completed using the status command: # gluster volume remove-brick gfs_v0 gfs11:/sdd1 gfs12:/sdd1 status 7. #gluster volume remove-brick gfs_v0 gfs11:/sdd1 gfs12:/sdd1 commit 8. # gluster volume remove-brick gfs_v0 gfs11:/sdc1 gfs12:/sdc1 start 9. # gluster volume remove-brick gfs_v0 gfs11:/sdc1 gfs12:/sdc1 status 10. # gluster volume remove-brick gfs_v0 gfs11:/sdc1 gfs12:/sdc1 commit

This would leave you with the original replica 2 volume that you had begun with. Hope this helps.

Note:
The latest version of glusterfs has the check that prevents a second remove-brick operation until the first one has been committed. (You would receive a message thus : "volume remove-brick start: failed: An earlier remove-brick task exists for volume <volname>. Either commit it or stop it before starting a new task." )

-Ravi


-C.B.
_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users








_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users



_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Reply via email to