I assume the options were off by default but I'll turn them back on.  I'm working on getting the information.

On 12/28/18 1:00 AM, Ashish Pandey wrote:

Hi Brett,

First the answers of all your questions -

1.  If a self-heal deamon is listed on a host (all of mine show one with
a volume status command) can I assume it's enabled and running?

For your volume, projects self heal daemon is UP and running

2.  I assume the volume that has all the self-heals pending has some
serious issues even though I can access the files and directories on
it.  If self-heal is running shouldn't the numbers be decreasing?

It should heal the entries and the number of entries coming in "gluster v heal volname info" command should be decreasing.

It appears to me self-heal is not working properly so how to I get it to
start working or should I delete the volume and start over?

As you can access all the files from mount point, I think the volume and the files are in good state as of now. I don't think you should think of deleting your volume before trying to fix it. If there is no fix or the fix is taking time you can go ahead with that option.

-----------------------
Why all these options are off?

performance.quick-read: off
performance.parallel-readdir: off
performance.readdir-ahead: off
performance.write-behind: off
performance.read-ahead: off

Although this should not matter to your issue but I think you should enable all the above unless you have a reason to not to do so.
--------------------

I would like you to perform following steps and provide some more information -

1 - Try to restart self heal and see if that works.
"gluster v start volume force" will kill and restart the self heal processes.

2 - If step 1 is not fruitful, get the list of entries need to be healed and pick one of the entry to heal. I mean we should focus on one entry to find out why it is
not getting healed instead of all the 5900 entries. Let's call it entry1.

3 -  Now access the entry1 from mount point, read, write on it and see if this entry has been healed. Check for heal info. Accessing file from mount point triggers client side heal
which could also heal the file.

4 - Check for the logs in /var/log/gluster, mount logs and glustershd logs should be checked and provided.

5 -  Get the external attributes of entry1 from all the bricks.

If the path of the entry1 on mount point is /a/b/c/entry1 then you have to run following command on all the nodes -

getfattr -m. -d -e hex <path of the brick on the node>/a/b/c/entry1

Please provide the output of above command too.

---
Ashish
















------------------------------------------------------------------------
*From: *"Brett Holcomb" <[email protected]>
*To: *[email protected]
*Sent: *Friday, December 28, 2018 3:49:50 AM
*Subject: *Re: [Gluster-users] Self Heal Confusion

Resend as I did not reply to the list earlier.  TBird responded to the poster and not the list.

On 12/27/18 11:46 AM, Brett Holcomb wrote:

    Thank you. I appreciate the help  Here is the information.  Let me
    know if you need anything else.  I'm fairly new to gluster.

    Gluster version is 5.2

    1. gluster v info

    Volume Name: projects
    Type: Distributed-Replicate
    Volume ID: 5aac71aa-feaa-44e9-a4f9-cb4dd6e0fdc3
    Status: Started
    Snapshot Count: 0
    Number of Bricks: 2 x 3 = 6
    Transport-type: tcp
    Bricks:
    Brick1: gfssrv1:/srv/gfs01/Projects
    Brick2: gfssrv2:/srv/gfs01/Projects
    Brick3: gfssrv3:/srv/gfs01/Projects
    Brick4: gfssrv4:/srv/gfs01/Projects
    Brick5: gfssrv5:/srv/gfs01/Projects
    Brick6: gfssrv6:/srv/gfs01/Projects
    Options Reconfigured:
    cluster.self-heal-daemon: enable
    performance.quick-read: off
    performance.parallel-readdir: off
    performance.readdir-ahead: off
    performance.write-behind: off
    performance.read-ahead: off
    performance.client-io-threads: off
    nfs.disable: on
    transport.address-family: inet
    server.allow-insecure: on
    storage.build-pgfid: on
    changelog.changelog: on
    changelog.capture-del-path: on

    2.  gluster v status

    Status of volume: projects
    Gluster process                             TCP Port  RDMA Port 
    Online  Pid
    
------------------------------------------------------------------------------
    Brick gfssrv1:/srv/gfs01/Projects           49154 0         
    Y       7213
    Brick gfssrv2:/srv/gfs01/Projects           49154 0         
    Y       6932
    Brick gfssrv3:/srv/gfs01/Projects           49154 0         
    Y       6920
    Brick gfssrv4:/srv/gfs01/Projects           49154 0         
    Y       6732
    Brick gfssrv5:/srv/gfs01/Projects           49154 0         
    Y       6950
    Brick gfssrv6:/srv/gfs01/Projects           49154 0         
    Y       6879
    Self-heal Daemon on localhost               N/A N/A        Y      
    11484
    Self-heal Daemon on gfssrv2                 N/A N/A        Y      
    10366
    Self-heal Daemon on gfssrv4                 N/A N/A        Y      
    9872
    Self-heal Daemon on srv-1-gfs3.corp.l1049h.
    net                                         N/A N/A        Y      
    9892
    Self-heal Daemon on gfssrv6                 N/A N/A        Y      
    10372
    Self-heal Daemon on gfssrv5                 N/A N/A        Y      
    10761

    Task Status of Volume projects
    
------------------------------------------------------------------------------
    There are no active volume tasks

    3. I've given the summary since the actual list for two volumes is
    around 5900 entries.

    Brick gfssrv1:/srv/gfs01/Projects
    Status: Connected
    Total Number of entries: 85
    Number of entries in heal pending: 85
    Number of entries in split-brain: 0
    Number of entries possibly healing: 0

    Brick gfssrv2:/srv/gfs01/Projects
    Status: Connected
    Total Number of entries: 0
    Number of entries in heal pending: 0
    Number of entries in split-brain: 0
    Number of entries possibly healing: 0

    Brick gfssrv3:/srv/gfs01/Projects
    Status: Connected
    Total Number of entries: 0
    Number of entries in heal pending: 0
    Number of entries in split-brain: 0
    Number of entries possibly healing: 0

    Brick gfssrv4:/srv/gfs01/Projects
    Status: Connected
    Total Number of entries: 0
    Number of entries in heal pending: 0
    Number of entries in split-brain: 0
    Number of entries possibly healing: 0

    Brick gfssrv5:/srv/gfs01/Projects
    Status: Connected
    Total Number of entries: 58854
    Number of entries in heal pending: 58854
    Number of entries in split-brain: 0
    Number of entries possibly healing: 0

    Brick gfssrv6:/srv/gfs01/Projects
    Status: Connected
    Total Number of entries: 58854
    Number of entries in heal pending: 58854
    Number of entries in split-brain: 0
    Number of entries possibly healing: 0

    On 12/27/18 3:09 AM, Ashish Pandey wrote:

        Hi Brett,

        Could you please tell us more about the setup?

        1 - Gluster v info
        2 - gluster v status
        3 - gluster v heal <volname> info

        These are the very basic information to start with debugging
        or suggesting any workaround.
        It should always be included when asking such questions on
        mailing list so that people can reply sooner.


        Note: Please hide IP address/hostname or any other information
        you don't want world to see.

        ---
        Ashish

        ------------------------------------------------------------------------
        *From: *"Brett Holcomb" <[email protected]>
        *To: *[email protected]
        *Sent: *Thursday, December 27, 2018 12:19:15 AM
        *Subject: *Re: [Gluster-users] Self Heal Confusion

        Still no change in the heals pending.  I found this reference,
        
https://archive.fosdem.org/2017/schedule/event/glusterselinux/attachments/slides/1876/export/events/attachments/glusterselinux/slides/1876/fosdem.pdf,
        which mentions the default SELinux context for a brick and
        that internal operations such as self-heal, rebalance should
        be ignored. but they do not elaborate on what ignore means -
        is it just not doing self-heal or something else.

        I did set SELinux to permissive and nothing changed.  I'll try
        setting the bricks to the context mentioned in this pdf and
        see what happens.


        On 12/20/18 8:26 PM, John Strunk wrote:

            Assuming your bricks are up... yes, the heal count should
            be decreasing.

            There is/was a bug wherein self-heal would stop healing
            but would still be running. I don't know whether your
            version is affected, but the remedy is to just restart the
            self-heal daemon.
            Force start one of the volumes that has heals pending. The
            bricks are already running, but it will cause shd to
            restart and, assuming this is the problem, healing should
            begin...

            $ gluster vol start my-pending-heal-vol force

            Others could better comment on the status of the bug.

            -John


            On Thu, Dec 20, 2018 at 5:45 PM Brett Holcomb
            <[email protected] <mailto:[email protected]>> wrote:

                I have one volume that has 85 pending entries in
                healing and two more
                volumes with 58,854 entries in healing pending. These
                numbers are from
                the volume heal info summary command.  They have
                stayed constant for two
                days now.  I've read the gluster docs and many more. 
                The Gluster docs
                just give some commands and non gluster docs basically
                repeat that.
                Given that it appears no self-healing is going on for
                my volume I am
                confused as to why.

                1.  If a self-heal deamon is listed on a host (all of
                mine show one with
                a volume status command) can I assume it's enabled and
                running?

                2.  I assume the volume that has all the self-heals
                pending has some
                serious issues even though I can access the files and
                directories on
                it.  If self-heal is running shouldn't the numbers be
                decreasing?

                It appears to me self-heal is not working properly so
                how to I get it to
                start working or should I delete the volume and start
                over?

                I'm running gluster 5.2 on Centos 7 latest and updated.

                Thank you.


                _______________________________________________
                Gluster-users mailing list
                [email protected]
                <mailto:[email protected]>
                https://lists.gluster.org/mailman/listinfo/gluster-users


        _______________________________________________
        Gluster-users mailing list
        [email protected]
        https://lists.gluster.org/mailman/listinfo/gluster-users


_______________________________________________
Gluster-users mailing list
[email protected]
https://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
[email protected]
https://lists.gluster.org/mailman/listinfo/gluster-users

Reply via email to