On 08/08/13 13:09, Krishnan Parthasarathi wrote:
Hi Toby,

----- Original Message -----
Hi,
I'm getting some confusing "Incorrect brick" errors when attempting to
remove OR replace a brick.

gluster> volume info condor

Volume Name: condor
Type: Replicate
Volume ID: 9fef3f76-525f-4bfe-9755-151e0d8279fd
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: mel-storage01:/srv/brick/condor
Brick2: mel-storage02:/srv/brick/condor

gluster> volume remove-brick condor replica 1
mel-storage02:/srv/brick/condor start
Incorrect brick mel-storage02:/srv/brick/condor for volume condor


If that is the incorrect brick, then what have I done wrong?

I agree that the error message displayed is far from helpful. The reason your
attempt to remove a brick from 1X2 replicate volume failed is because
it is not a 'legal' operation.

Here are some rules and background, that are implicit, about how to determine 
if a
remove-brick operation is allowed. Some may seem debatable, but
that is how things are today. We could refine them and arrive evolve
better set of rules via discussions on the mailing lists.

1) remove-brick start variant is applicable *only* when you have the dht (or 
distribute)
type volume. In 3.3, you could identify that by observing the output of "gluster volume 
info <VOLNAME>".
The "Type" field would display "Distribute-<something>". Additionally, even in a
Distribute type volume, which includes Distribute-Replicate Distribute-Stripe 
and other combinations,
all the bricks belonging to the subvolume would need to be removed in one go.
For eg,
Lets assume a 2X2 volume V1, with bricks b1, b2, b3, b4, such that b1,b2 form a 
pair; b3,b4 form the other pair.
If you wanted to use the remove-brick start variant, say for scaling down the 
volume, you should do the following,

#gluster volume remove-brick V1 b3 b4 start
#gluster volume remove-brick V1 b3 b4 status

Once the remove-brick operation is completed,
#gluster volume remove-brick V1 b3 b4 commit

This would leave volume V1 with bricks b1,b2.

In the above workflow, the data residing in b3,b4 is migrated to
b1,b2.

2) remove-brick (without the 'start' subcommand) can be used to reduce the 
replica count till 2,
in a Distribute-Replicate type volume. As of today, remove-brick doesn't permit 
reducing of
replica count in a pure replicate volume. ie. 1XN, where N >= 2.
Note: There is some activity around evolving the 'right' rule. See 
http://review.gluster.com/#/c/5364/

The above rules have been evolved with the thought that, no legal command must 
allow the
user to shoot her foot, without a 'repair' path. Put differently, we disallow 
commands
that might lead to data loss, without the user being fully aware of it.

Hope that helps,
krish


Well, it's a bit of a moot point now, since we had to rebuild the cluster anyway.

Note that we attempted to raise the replica level to 3 and THEN remove the old brick, and that failed to work. We also tried using replace-brick to switch the old one out for the new one. That also failed with Incorrect Brick. (the replace-brick method was actually the first way we tried)

As such -- it seems there is no way to replace a failed server with a new one if you're using the Replicated setup?


Toby
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Reply via email to