Re: [Gluster-users] Incorrect brick errors

2013-08-07 Thread Toby Corkindale

I never did manage to figure this out.
All attempts to replace-brick failed inexplicably; we could add-brick 
but then still not remove-brick the old one, and the new bricks didn't 
seem to be functioning properly anyway.


Eventually we just sucked it up and caused a couple of hours of downtime 
across all production servers while we brought up a whole new gluster 
cluster and moved everything to it.


That's been the final straw for us though -- we're going to ditch 
Gluster across the company as soon as possible. It's too risky to keep 
using it.
It's been unreliable and unpredictable, and if anything version 3.3 has 
been worse than 3.2 for bugs. (And I have no faith at all that 3.4 is an 
improvement)


-Toby


On 07/08/13 11:44, Toby Corkindale wrote:

On 06/08/13 18:24, Toby Corkindale wrote:

Hi,
I'm getting some confusing Incorrect brick errors when attempting to
remove OR replace a brick.

gluster volume info condor

Volume Name: condor
Type: Replicate
Volume ID: 9fef3f76-525f-4bfe-9755-151e0d8279fd
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: mel-storage01:/srv/brick/condor
Brick2: mel-storage02:/srv/brick/condor

gluster volume remove-brick condor replica 1
mel-storage02:/srv/brick/condor start
Incorrect brick mel-storage02:/srv/brick/condor for volume condor


If that is the incorrect brick, then what have I done wrong?


Note that the log files don't seem to be any use here, they just report:


E [glusterd-brick-ops.c:749:glusterd_handle_remove_brick] 0-: Incorrect
brick mel-storage02:/srv/brick/condor for volume condor


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Incorrect brick errors

2013-08-07 Thread Krishnan Parthasarathi
Hi Toby,

- Original Message -
 Hi,
 I'm getting some confusing Incorrect brick errors when attempting to
 remove OR replace a brick.
 
 gluster volume info condor
 
 Volume Name: condor
 Type: Replicate
 Volume ID: 9fef3f76-525f-4bfe-9755-151e0d8279fd
 Status: Started
 Number of Bricks: 1 x 2 = 2
 Transport-type: tcp
 Bricks:
 Brick1: mel-storage01:/srv/brick/condor
 Brick2: mel-storage02:/srv/brick/condor
 
 gluster volume remove-brick condor replica 1
 mel-storage02:/srv/brick/condor start
 Incorrect brick mel-storage02:/srv/brick/condor for volume condor
 
 
 If that is the incorrect brick, then what have I done wrong?

I agree that the error message displayed is far from helpful. The reason your 
attempt to remove a brick from 1X2 replicate volume failed is because
it is not a 'legal' operation.

Here are some rules and background, that are implicit, about how to determine 
if a
remove-brick operation is allowed. Some may seem debatable, but
that is how things are today. We could refine them and arrive evolve
better set of rules via discussions on the mailing lists.

1) remove-brick start variant is applicable *only* when you have the dht (or 
distribute)
type volume. In 3.3, you could identify that by observing the output of 
gluster volume info VOLNAME.
The Type field would display Distribute-something. Additionally, even in a
Distribute type volume, which includes Distribute-Replicate Distribute-Stripe 
and other combinations,
all the bricks belonging to the subvolume would need to be removed in one go.
For eg,
Lets assume a 2X2 volume V1, with bricks b1, b2, b3, b4, such that b1,b2 form a 
pair; b3,b4 form the other pair.
If you wanted to use the remove-brick start variant, say for scaling down the 
volume, you should do the following,

#gluster volume remove-brick V1 b3 b4 start
#gluster volume remove-brick V1 b3 b4 status

Once the remove-brick operation is completed,
#gluster volume remove-brick V1 b3 b4 commit

This would leave volume V1 with bricks b1,b2.

In the above workflow, the data residing in b3,b4 is migrated to
b1,b2.

2) remove-brick (without the 'start' subcommand) can be used to reduce the 
replica count till 2,
in a Distribute-Replicate type volume. As of today, remove-brick doesn't permit 
reducing of
replica count in a pure replicate volume. ie. 1XN, where N = 2.
Note: There is some activity around evolving the 'right' rule. See 
http://review.gluster.com/#/c/5364/

The above rules have been evolved with the thought that, no legal command must 
allow the
user to shoot her foot, without a 'repair' path. Put differently, we disallow 
commands
that might lead to data loss, without the user being fully aware of it.

Hope that helps,
krish


 


 
 
 thanks,
 Toby
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users
 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Incorrect brick errors

2013-08-07 Thread Toby Corkindale

On 08/08/13 13:09, Krishnan Parthasarathi wrote:

Hi Toby,

- Original Message -

Hi,
I'm getting some confusing Incorrect brick errors when attempting to
remove OR replace a brick.

gluster volume info condor

Volume Name: condor
Type: Replicate
Volume ID: 9fef3f76-525f-4bfe-9755-151e0d8279fd
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: mel-storage01:/srv/brick/condor
Brick2: mel-storage02:/srv/brick/condor

gluster volume remove-brick condor replica 1
mel-storage02:/srv/brick/condor start
Incorrect brick mel-storage02:/srv/brick/condor for volume condor


If that is the incorrect brick, then what have I done wrong?


I agree that the error message displayed is far from helpful. The reason your
attempt to remove a brick from 1X2 replicate volume failed is because
it is not a 'legal' operation.

Here are some rules and background, that are implicit, about how to determine 
if a
remove-brick operation is allowed. Some may seem debatable, but
that is how things are today. We could refine them and arrive evolve
better set of rules via discussions on the mailing lists.

1) remove-brick start variant is applicable *only* when you have the dht (or 
distribute)
type volume. In 3.3, you could identify that by observing the output of gluster volume 
info VOLNAME.
The Type field would display Distribute-something. Additionally, even in a
Distribute type volume, which includes Distribute-Replicate Distribute-Stripe 
and other combinations,
all the bricks belonging to the subvolume would need to be removed in one go.
For eg,
Lets assume a 2X2 volume V1, with bricks b1, b2, b3, b4, such that b1,b2 form a 
pair; b3,b4 form the other pair.
If you wanted to use the remove-brick start variant, say for scaling down the 
volume, you should do the following,

#gluster volume remove-brick V1 b3 b4 start
#gluster volume remove-brick V1 b3 b4 status

Once the remove-brick operation is completed,
#gluster volume remove-brick V1 b3 b4 commit

This would leave volume V1 with bricks b1,b2.

In the above workflow, the data residing in b3,b4 is migrated to
b1,b2.

2) remove-brick (without the 'start' subcommand) can be used to reduce the 
replica count till 2,
in a Distribute-Replicate type volume. As of today, remove-brick doesn't permit 
reducing of
replica count in a pure replicate volume. ie. 1XN, where N = 2.
Note: There is some activity around evolving the 'right' rule. See 
http://review.gluster.com/#/c/5364/

The above rules have been evolved with the thought that, no legal command must 
allow the
user to shoot her foot, without a 'repair' path. Put differently, we disallow 
commands
that might lead to data loss, without the user being fully aware of it.

Hope that helps,
krish



Well, it's a bit of a moot point now, since we had to rebuild the 
cluster anyway.


Note that we attempted to raise the replica level to 3 and THEN remove 
the old brick, and that failed to work. We also tried using 
replace-brick to switch the old one out for the new one. That also 
failed with Incorrect Brick. (the replace-brick method was actually the 
first way we tried)


As such -- it seems there is no way to replace a failed server with a 
new one if you're using the Replicated setup?



Toby
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Incorrect brick errors

2013-08-06 Thread Toby Corkindale

On 06/08/13 18:24, Toby Corkindale wrote:

Hi,
I'm getting some confusing Incorrect brick errors when attempting to
remove OR replace a brick.

gluster volume info condor

Volume Name: condor
Type: Replicate
Volume ID: 9fef3f76-525f-4bfe-9755-151e0d8279fd
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: mel-storage01:/srv/brick/condor
Brick2: mel-storage02:/srv/brick/condor

gluster volume remove-brick condor replica 1
mel-storage02:/srv/brick/condor start
Incorrect brick mel-storage02:/srv/brick/condor for volume condor


If that is the incorrect brick, then what have I done wrong?


Note that the log files don't seem to be any use here, they just report:


E [glusterd-brick-ops.c:749:glusterd_handle_remove_brick] 0-: Incorrect 
brick mel-storage02:/srv/brick/condor for volume condor


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users