On 08/15/2013 11:11 AM, Cool wrote:
I'm gonna stop debugging this as I still cannot figure out how to
reproduce this problem for further debug. I did 4~5 rounds of tests
(all from scratch) yesterday and today, only met the problem once
Monday afternoon, but repeating the steps didn't give me the same
result. Also checked log there was nothing wrong except rebalance was
happening on the wrong bricks.
I will raise this again if I can have any useful information.
-C.B.
Sure C.B, thanks for your efforts.
On 8/13/2013 7:00 AM, Cool wrote:
Thanks Ravi, I manged to reproduce the issue for 2 times in the past
several days, but without anything significant in log, volume info
and after shows correct information (i.e. sdd1 got removed though
data was not migrated out), rebalance.log telling it was migrating
data out of sdc1, not sdd1.
I'm doing another try now with -L TRACE to see if I can get more log
information, this will take some time, will post here if I find
anything helpful.
-C.B.
On 8/13/2013 6:49 AM, Ravishankar N wrote:
On 08/13/2013 06:21 PM, Cool wrote:
I'm pretty sure I did "watch ... remove-brick ... status" till it
mentioned everything is completed before trigger commit, I should
make it clear in my previous mail.
Actually you can read my mail again - in step #5, files on /sdc1
got migrated instead of /sdd1, even though my command was trying to
remove-brick /sdd1,
Ah, my bad. Got it now. This is strange..
this is the root cause (to me) that caused the problem, as data on
/sdc1 migrated to /sdb1 and /sdd1, then commit simply remove /sdd1
from gfs_v0. It seems vol definition information got some problem
in gluster.
If you are able to reproduce the issue, does 'gluster volume info'
show the correct bricks before and after start-status-commit
operations of removing sdd1? You could also see if there are any
error messages in /var/log/glusterfs/<volname>-rebalance.log
-Ravi
-C.B.
On 8/12/2013 9:51 PM, Ravishankar N wrote:
On 08/13/2013 03:43 AM, Cool wrote:
remove-brick in 3.4.0 seems removing wrong bricks, can someone
help to review the environment/steps to see if I did anything
stupid?
setup - Ubuntu 12.04LTS on gfs11 and gfs12, with following
packages from ppa, both nodes have 3 xfs partitions sdb1, sdc1,
sdd1:
ii glusterfs-client 3.4.0final-ubuntu1~precise1 clustered
file-system (client package)
ii glusterfs-common 3.4.0final-ubuntu1~precise1 GlusterFS common
libraries and translator modules
ii glusterfs-server 3.4.0final-ubuntu1~precise1 clustered
file-system (server package)
step to reproduce the problem:
1. create volume gfs_v0 in replica 2 with gfs11:/sdb1 and
gfs12:/sdb1
2. add-brick gfs11:/sdc1 and gfs12:/sdc1
3. add-brick gfs11:/sdd1 and gfs12:/sdd1
4. rebalance to make files distributed to all three pair of disks
5. remove-brick gfs11:/sdd1 and gfs12:/sdd1 start, files on
***/sdc1*** are migrating out
6. remove-brick commit led to data loss in gfs_v0
If between step 5 and 6 I initiate a remove-brick targeting
/sdc1, then after commit I would not lose anything since all data
will be migrated back to /sdb1.
You should ensure that a 'remove-brick start ' has completed and
then commit it before initiating the second one. The correct way
to do this would be:
5. # gluster volume remove-brick gfs_v0 gfs11:/sdd1 gfs12:/sdd1
start
6. Check that the data migration has been completed using the
status command:
# gluster volume remove-brick gfs_v0 gfs11:/sdd1 gfs12:/sdd1
status
7. #gluster volume remove-brick gfs_v0 gfs11:/sdd1 gfs12:/sdd1
commit
8. # gluster volume remove-brick gfs_v0 gfs11:/sdc1 gfs12:/sdc1
start
9. # gluster volume remove-brick gfs_v0 gfs11:/sdc1 gfs12:/sdc1
status
10. # gluster volume remove-brick gfs_v0 gfs11:/sdc1 gfs12:/sdc1
commit
This would leave you with the original replica 2 volume that you
had begun with. Hope this helps.
Note:
The latest version of glusterfs has the check that prevents a
second remove-brick operation until the first one has been committed.
(You would receive a message thus : "volume remove-brick start:
failed: An earlier remove-brick task exists for volume <volname>.
Either commit it or stop it before starting a new task." )
-Ravi
-C.B.
_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users