Re: [Gluster-users] Incorrect brick errors
I never did manage to figure this out. All attempts to replace-brick failed inexplicably; we could add-brick but then still not remove-brick the old one, and the new bricks didn't seem to be functioning properly anyway. Eventually we just sucked it up and caused a couple of hours of downtime across all production servers while we brought up a whole new gluster cluster and moved everything to it. That's been the final straw for us though -- we're going to ditch Gluster across the company as soon as possible. It's too risky to keep using it. It's been unreliable and unpredictable, and if anything version 3.3 has been worse than 3.2 for bugs. (And I have no faith at all that 3.4 is an improvement) -Toby On 07/08/13 11:44, Toby Corkindale wrote: On 06/08/13 18:24, Toby Corkindale wrote: Hi, I'm getting some confusing Incorrect brick errors when attempting to remove OR replace a brick. gluster volume info condor Volume Name: condor Type: Replicate Volume ID: 9fef3f76-525f-4bfe-9755-151e0d8279fd Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: mel-storage01:/srv/brick/condor Brick2: mel-storage02:/srv/brick/condor gluster volume remove-brick condor replica 1 mel-storage02:/srv/brick/condor start Incorrect brick mel-storage02:/srv/brick/condor for volume condor If that is the incorrect brick, then what have I done wrong? Note that the log files don't seem to be any use here, they just report: E [glusterd-brick-ops.c:749:glusterd_handle_remove_brick] 0-: Incorrect brick mel-storage02:/srv/brick/condor for volume condor ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Incorrect brick errors
Hi Toby, - Original Message - Hi, I'm getting some confusing Incorrect brick errors when attempting to remove OR replace a brick. gluster volume info condor Volume Name: condor Type: Replicate Volume ID: 9fef3f76-525f-4bfe-9755-151e0d8279fd Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: mel-storage01:/srv/brick/condor Brick2: mel-storage02:/srv/brick/condor gluster volume remove-brick condor replica 1 mel-storage02:/srv/brick/condor start Incorrect brick mel-storage02:/srv/brick/condor for volume condor If that is the incorrect brick, then what have I done wrong? I agree that the error message displayed is far from helpful. The reason your attempt to remove a brick from 1X2 replicate volume failed is because it is not a 'legal' operation. Here are some rules and background, that are implicit, about how to determine if a remove-brick operation is allowed. Some may seem debatable, but that is how things are today. We could refine them and arrive evolve better set of rules via discussions on the mailing lists. 1) remove-brick start variant is applicable *only* when you have the dht (or distribute) type volume. In 3.3, you could identify that by observing the output of gluster volume info VOLNAME. The Type field would display Distribute-something. Additionally, even in a Distribute type volume, which includes Distribute-Replicate Distribute-Stripe and other combinations, all the bricks belonging to the subvolume would need to be removed in one go. For eg, Lets assume a 2X2 volume V1, with bricks b1, b2, b3, b4, such that b1,b2 form a pair; b3,b4 form the other pair. If you wanted to use the remove-brick start variant, say for scaling down the volume, you should do the following, #gluster volume remove-brick V1 b3 b4 start #gluster volume remove-brick V1 b3 b4 status Once the remove-brick operation is completed, #gluster volume remove-brick V1 b3 b4 commit This would leave volume V1 with bricks b1,b2. In the above workflow, the data residing in b3,b4 is migrated to b1,b2. 2) remove-brick (without the 'start' subcommand) can be used to reduce the replica count till 2, in a Distribute-Replicate type volume. As of today, remove-brick doesn't permit reducing of replica count in a pure replicate volume. ie. 1XN, where N = 2. Note: There is some activity around evolving the 'right' rule. See http://review.gluster.com/#/c/5364/ The above rules have been evolved with the thought that, no legal command must allow the user to shoot her foot, without a 'repair' path. Put differently, we disallow commands that might lead to data loss, without the user being fully aware of it. Hope that helps, krish thanks, Toby ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Incorrect brick errors
On 08/08/13 13:09, Krishnan Parthasarathi wrote: Hi Toby, - Original Message - Hi, I'm getting some confusing Incorrect brick errors when attempting to remove OR replace a brick. gluster volume info condor Volume Name: condor Type: Replicate Volume ID: 9fef3f76-525f-4bfe-9755-151e0d8279fd Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: mel-storage01:/srv/brick/condor Brick2: mel-storage02:/srv/brick/condor gluster volume remove-brick condor replica 1 mel-storage02:/srv/brick/condor start Incorrect brick mel-storage02:/srv/brick/condor for volume condor If that is the incorrect brick, then what have I done wrong? I agree that the error message displayed is far from helpful. The reason your attempt to remove a brick from 1X2 replicate volume failed is because it is not a 'legal' operation. Here are some rules and background, that are implicit, about how to determine if a remove-brick operation is allowed. Some may seem debatable, but that is how things are today. We could refine them and arrive evolve better set of rules via discussions on the mailing lists. 1) remove-brick start variant is applicable *only* when you have the dht (or distribute) type volume. In 3.3, you could identify that by observing the output of gluster volume info VOLNAME. The Type field would display Distribute-something. Additionally, even in a Distribute type volume, which includes Distribute-Replicate Distribute-Stripe and other combinations, all the bricks belonging to the subvolume would need to be removed in one go. For eg, Lets assume a 2X2 volume V1, with bricks b1, b2, b3, b4, such that b1,b2 form a pair; b3,b4 form the other pair. If you wanted to use the remove-brick start variant, say for scaling down the volume, you should do the following, #gluster volume remove-brick V1 b3 b4 start #gluster volume remove-brick V1 b3 b4 status Once the remove-brick operation is completed, #gluster volume remove-brick V1 b3 b4 commit This would leave volume V1 with bricks b1,b2. In the above workflow, the data residing in b3,b4 is migrated to b1,b2. 2) remove-brick (without the 'start' subcommand) can be used to reduce the replica count till 2, in a Distribute-Replicate type volume. As of today, remove-brick doesn't permit reducing of replica count in a pure replicate volume. ie. 1XN, where N = 2. Note: There is some activity around evolving the 'right' rule. See http://review.gluster.com/#/c/5364/ The above rules have been evolved with the thought that, no legal command must allow the user to shoot her foot, without a 'repair' path. Put differently, we disallow commands that might lead to data loss, without the user being fully aware of it. Hope that helps, krish Well, it's a bit of a moot point now, since we had to rebuild the cluster anyway. Note that we attempted to raise the replica level to 3 and THEN remove the old brick, and that failed to work. We also tried using replace-brick to switch the old one out for the new one. That also failed with Incorrect Brick. (the replace-brick method was actually the first way we tried) As such -- it seems there is no way to replace a failed server with a new one if you're using the Replicated setup? Toby ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Incorrect brick errors
On 06/08/13 18:24, Toby Corkindale wrote: Hi, I'm getting some confusing Incorrect brick errors when attempting to remove OR replace a brick. gluster volume info condor Volume Name: condor Type: Replicate Volume ID: 9fef3f76-525f-4bfe-9755-151e0d8279fd Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: mel-storage01:/srv/brick/condor Brick2: mel-storage02:/srv/brick/condor gluster volume remove-brick condor replica 1 mel-storage02:/srv/brick/condor start Incorrect brick mel-storage02:/srv/brick/condor for volume condor If that is the incorrect brick, then what have I done wrong? Note that the log files don't seem to be any use here, they just report: E [glusterd-brick-ops.c:749:glusterd_handle_remove_brick] 0-: Incorrect brick mel-storage02:/srv/brick/condor for volume condor ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users