On 08/08/13 13:09, Krishnan Parthasarathi wrote:
Hi Toby,
----- Original Message -----
Hi,
I'm getting some confusing "Incorrect brick" errors when attempting to
remove OR replace a brick.
gluster> volume info condor
Volume Name: condor
Type: Replicate
Volume ID: 9fef3f76-525f-4bfe-9755-151e0d8279fd
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: mel-storage01:/srv/brick/condor
Brick2: mel-storage02:/srv/brick/condor
gluster> volume remove-brick condor replica 1
mel-storage02:/srv/brick/condor start
Incorrect brick mel-storage02:/srv/brick/condor for volume condor
If that is the incorrect brick, then what have I done wrong?
I agree that the error message displayed is far from helpful. The reason your
attempt to remove a brick from 1X2 replicate volume failed is because
it is not a 'legal' operation.
Here are some rules and background, that are implicit, about how to determine
if a
remove-brick operation is allowed. Some may seem debatable, but
that is how things are today. We could refine them and arrive evolve
better set of rules via discussions on the mailing lists.
1) remove-brick start variant is applicable *only* when you have the dht (or
distribute)
type volume. In 3.3, you could identify that by observing the output of "gluster volume
info <VOLNAME>".
The "Type" field would display "Distribute-<something>". Additionally, even in a
Distribute type volume, which includes Distribute-Replicate Distribute-Stripe
and other combinations,
all the bricks belonging to the subvolume would need to be removed in one go.
For eg,
Lets assume a 2X2 volume V1, with bricks b1, b2, b3, b4, such that b1,b2 form a
pair; b3,b4 form the other pair.
If you wanted to use the remove-brick start variant, say for scaling down the
volume, you should do the following,
#gluster volume remove-brick V1 b3 b4 start
#gluster volume remove-brick V1 b3 b4 status
Once the remove-brick operation is completed,
#gluster volume remove-brick V1 b3 b4 commit
This would leave volume V1 with bricks b1,b2.
In the above workflow, the data residing in b3,b4 is migrated to
b1,b2.
2) remove-brick (without the 'start' subcommand) can be used to reduce the
replica count till 2,
in a Distribute-Replicate type volume. As of today, remove-brick doesn't permit
reducing of
replica count in a pure replicate volume. ie. 1XN, where N >= 2.
Note: There is some activity around evolving the 'right' rule. See
http://review.gluster.com/#/c/5364/
The above rules have been evolved with the thought that, no legal command must
allow the
user to shoot her foot, without a 'repair' path. Put differently, we disallow
commands
that might lead to data loss, without the user being fully aware of it.
Hope that helps,
krish
Well, it's a bit of a moot point now, since we had to rebuild the
cluster anyway.
Note that we attempted to raise the replica level to 3 and THEN remove
the old brick, and that failed to work. We also tried using
replace-brick to switch the old one out for the new one. That also
failed with Incorrect Brick. (the replace-brick method was actually the
first way we tried)
As such -- it seems there is no way to replace a failed server with a
new one if you're using the Replicated setup?
Toby
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users