[Gluster-users] temporarily remove a failed brick on distribute only volume

BJ Quinn Fri, 26 May 2017 11:51:21 -0700

I've got 3 systems I'd like to set up as bricks for a distribute only volume. I 
understand that if one of them fails, the volume will still read and write 
files that hash to the non-failed bricks.


What I'm wondering is if I could forcibly remove the failed brick from the 
volume and get the remaining two systems to basically fix the layout to involve 
only the two non-failed bricks, and then resume writes for all filenames. The 
reasoning for this would be that it would be anticipated that the failed brick 
would come back up eventually. Each system is internally redundant, at least 
from a data standpoint (zfs RAIDZ2), and the types of things that would fail 
the brick would not likely be representative of a permanent irrecoverable 
failure, but might take the system out of commission long enough (say, the 
motherboard fried) that I might not want the volume being down the whole time 
it takes to repair the failed system. 

This is a write-mostly workload, and the writes are always being done to new 
files. With my workload, there is no risk of duplicated file names while the 
failed brick is away. If 1/3 of the files disappeared for a little while and 
came back later, that would be acceptable as long as I could write to the 
volume in the interim. It's fine if this requires manual intervention. My 
theory on how to bring the brick back would simply be to re-add the repaired 
brick, with all its existing data in tact, and then do the trick where you run 
"find" on the whole underlying filesystem to get Gluster to recognize all the 
files. At that point, you'd be back in business with the whole volume again.

_______________________________________________
Gluster-users mailing list
[email protected]
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] temporarily remove a failed brick on distribute only volume

Reply via email to