Re: [Gluster-users] How To Turn Off Self Heal

Pranith Kumar Karampuri Wed, 21 Jan 2015 11:03:15 -0800


On 01/17/2015 05:28 AM, Kyle Harris wrote:

Hello,
I created a post a few days ago named "Turning Off Self Heal OptionsDon't Appear Work?" which can be found at the following link:http://www.gluster.org/pipermail/gluster-users/2015-January/020114.html
I never got a response so I decided to set up a test in a labenvironment. I am able to reproduce the same thing so I'm hopingsomeone can help me.
I have discovered over time that if a single node in a 3-nodereplicated cluster with many small files is off for any length oftime, when it comes back on-line, it does a great deal of self-healingthat can cause the glusterfs and glusterfsd processes to spike on themachines to a degree that makes them unusable. I only have onevolume, with a client mount on each server where it hosts manywebsites running PHP. All is fine until the healing process goes intooverdrive.
So, I attempted to turn off self-healing by setting the followingthree settings:
gluster volume set gv0 cluster.data-self-heal off
gluster volume set gv0 cluster.entry-self-heal off
gluster volume set gv0 cluster.metadata-self-heal off

hi Kyle,

Krutika wanted to send a response to you today, but we spent thewhole day debugging a bug. Let me answer some of the things we alreadydiscussed on behalf of Krutika.Krutika (CCed) has found one issue where even when some of theoptions are turned off, self-heal was still triggered. But if all theoptions are turned off I think it wouldn't do any heals from the mountprocess. But glustershd can still do heals. To disable that healing, weneed to turn off self-heal-daemon using 'gluster volume set <volname>self-heal-daemon off'

Note that I would rather not set gv0 cluster.self-heal-daemon off asthen I can't see what needs healing such that I can do it at a latertime. Those settings appear to have no affect at all.

Ah! 3.6.2 will be able to give the output of 'gluster volume heal<volname> info' output even when self-heal-daemon is turned off.

Here is how I reproduced this in my lab:

Output from "gluster volume info gv0":
Volume Name: gv0
Type: Replicate
Volume ID: a55f8619-0789-4a1c-9cda-a903bc908fd1
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 192.168.1.116:/export/brick1
Brick2: 192.168.1.140:/export/brick1
Brick3: 192.168.1.123:/export/brick1
Options Reconfigured:
cluster.metadata-self-heal: off
cluster.entry-self-heal: off
cluster.data-self-heal: off
This was done using the latest version of gluster as of this writing,v3.6.1 installed on CentOS 6.6 using the rpms available from thegluster web site.
Here is how I tested:
- With all 3 nodes up, I put 4 simple text files on the cluster
- I then turned one node off
- Next I made a change to 2 of the text files
- Then I brought the previously turned off node back up
Upon doing so, I see far more than 2 of the following message in theglusterhd.log:
[2015-01-15 23:19:30.471384] I[afr-self-heal-entry.c:545:afr_selfheal_entry_do] 0-gv0-replicate-0:performing entry selfheal on 00000000-0000-0000-0000-000000000001[2015-01-15 23:19:30.494714] I[afr-self-heal-common.c:476:afr_log_selfheal] 0-gv0-replicate-0:Completed entry selfheal on 00000000-0000-0000-0000-000000000001.source=0 sinks=
Questions:
- So is this a bug?

The log seems to suggest that it didn't find any 'sinks' to heal to soit wouldn't have done any file creation/deletions. May be we should fixthe log or see if there is more to that bug.

- Why am I seeing "entry selfheal" messaages when this feature issupposed to be turned off?

Because glustershd can still do self-heals as wel didn't disable it?

- Also, why am I seeing far more selfheal messages than 2 when I onlychanged 2 files while the single node was down?

At the moment, I believe they are just log messages and not reallyheals. But we will need to look further and find if there is more to it.

- Finally, how do I really turn off these selfheals that are takingplace without completely turning off the cluster.self-heal-daemon forreasons mentioned above?

There are 2 workarounds until 3.6.2 is released for this:

1) As a workaround may be we can turn self-heal-daemon off. When we wantto see the files that need healing, we can turn it on, see theinformation and turn it off immediately. This broken functionality madeit to 3.6.1 because I couldn't re-implement the feature for afrv2 intime for the release. Sorry about that!

2) Other way to do it is to inspect the gfids of the files that needheal directly by looking at the directory<brick-path>/.glusterfs/indices/xattrop. This is where self-heal-daemonlooks at and finds the files that need healing.

You were saying you know a way to make machines unusable by triggeringself-heals. It would be very good if we can replicate that test in ourlabs. Wondering if you have any pointers for us to do the same.


Pranith


Thank you for any insight you may be able to provide on this.

--
Kyle


_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] How To Turn Off Self Heal

Reply via email to