Hello, I created a post a few days ago named "Turning Off Self Heal Options Don't Appear Work?" which can be found at the following link: http://www.gluster.org/pipermail/gluster-users/2015-January/020114.html
I never got a response so I decided to set up a test in a lab environment. I am able to reproduce the same thing so I'm hoping someone can help me. I have discovered over time that if a single node in a 3-node replicated cluster with many small files is off for any length of time, when it comes back on-line, it does a great deal of self-healing that can cause the glusterfs and glusterfsd processes to spike on the machines to a degree that makes them unusable. I only have one volume, with a client mount on each server where it hosts many websites running PHP. All is fine until the healing process goes into overdrive. So, I attempted to turn off self-healing by setting the following three settings: gluster volume set gv0 cluster.data-self-heal off gluster volume set gv0 cluster.entry-self-heal off gluster volume set gv0 cluster.metadata-self-heal off Note that I would rather not set gv0 cluster.self-heal-daemon off as then I can't see what needs healing such that I can do it at a later time. Those settings appear to have no affect at all. Here is how I reproduced this in my lab: Output from "gluster volume info gv0": Volume Name: gv0 Type: Replicate Volume ID: a55f8619-0789-4a1c-9cda-a903bc908fd1 Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 192.168.1.116:/export/brick1 Brick2: 192.168.1.140:/export/brick1 Brick3: 192.168.1.123:/export/brick1 Options Reconfigured: cluster.metadata-self-heal: off cluster.entry-self-heal: off cluster.data-self-heal: off This was done using the latest version of gluster as of this writing, v3.6.1 installed on CentOS 6.6 using the rpms available from the gluster web site. Here is how I tested: - With all 3 nodes up, I put 4 simple text files on the cluster - I then turned one node off - Next I made a change to 2 of the text files - Then I brought the previously turned off node back up Upon doing so, I see far more than 2 of the following message in the glusterhd.log: [2015-01-15 23:19:30.471384] I [afr-self-heal-entry.c:545:afr_selfheal_entry_do] 0-gv0-replicate-0: performing entry selfheal on 00000000-0000-0000-0000-000000000001 [2015-01-15 23:19:30.494714] I [afr-self-heal-common.c:476:afr_log_selfheal] 0-gv0-replicate-0: Completed entry selfheal on 00000000-0000-0000-0000-000000000001. source=0 sinks= Questions: - So is this a bug? - Why am I seeing "entry selfheal" messaages when this feature is supposed to be turned off? - Also, why am I seeing far more selfheal messages than 2 when I only changed 2 files while the single node was down? - Finally, how do I really turn off these selfheals that are taking place without completely turning off the cluster.self-heal-daemon for reasons mentioned above? Thank you for any insight you may be able to provide on this. -- Kyle
_______________________________________________ Gluster-users mailing list [email protected] http://www.gluster.org/mailman/listinfo/gluster-users
