Re: [Gluster-users] Disperse volume recovery and healing

Victor T Fri, 16 Mar 2018 05:27:07 -0700

Xavi, does that mean that even if every node was rebooted one at a time even 
without issuing a heal that the volume would have no issues after running 
gluster volume heal [volname] when all bricks are back online?

________________________________
From: Xavi Hernandez <[email protected]>
Sent: Thursday, March 15, 2018 12:09:05 AM
To: Victor T
Cc: [email protected]
Subject: Re: [Gluster-users] Disperse volume recovery and healing

Hi Victor,

On Wed, Mar 14, 2018 at 12:30 AM, Victor T 
<[email protected]<mailto:[email protected]>> wrote:

I have a question about how disperse volumes handle brick failure. I'm running 
version 3.10.10 on all systems. If I have a disperse volume in a 4+2 
configuration with 6 servers each serving 1 brick, and maintenance needs to be 
performed on all systems, are there any general steps that need to be taken to 
ensure data is not lost or service interrupted? For example, can I just reboot 
each system sequentially after making sure sure the service is running on all 
servers before rebooting the next system? Or is there a need to force/wait for 
a heal after each brick comes back online? If I have two bricks down for 
multiple days and then bring them back in, is there a need to issue a heal or 
something like a rebalance before rebooting the other servers? There's lots of 
documentation about other volume types, but it seems information specific to 
dispersed volumes is a bit hard to find. Thanks a bunch.

On a 4+2 configuration you could bring down up to 2 bricks simultaneously for 
maintenance. However if something happens to one of the remaining 4 bricks, the 
volume would stop working. So in this case I would recommend to not have more 
than one server down for maintenance at the same time unless the down time is 
very very small.

Once the stopped servers come back up again, you need to wait until all files 
are healed before proceeding with the next server. Failing to do so means that 
some files could have more than 2 non-healthy versions, what will make the file 
inaccessible until enough healthy versions are available again.

Self-heal should be automatically triggered once the bricks come online, 
however there was a bug (https://bugzilla.redhat.com/show_bug.cgi?id=1547662) 
that could cause delays in the self-heal process. This bug should be fixed in 
the next version. Meantime you can force self-heal to progress by issuing 
"gluster volume heal <volname>" commands each time it seems to have stopped.

Once the output of "gluster volume heal <volname> info" reports 0 pending files 
on all bricks, you can proceed with the maintenance of the next server.

No need to do any rebalance for down bricks. Rebalance is basically needed when 
volume is expanded with more bricks.

Xavi

_______________________________________________
Gluster-users mailing list
[email protected]<mailto:[email protected]>
http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
[email protected]
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Disperse volume recovery and healing

Reply via email to