Re: [Gluster-users] [External] Re: Self Heal Confusion

Brett Holcomb Tue, 01 Jan 2019 08:59:17 -0800

Healing time set to 120 seconds for now.

Just to make sure I understand I need to take the result of the glustervolume heal projects info and put it in a file. Then try and find eachguid listed in that file in the .glusterfs directory for each bricklisted in the output as having unhealed files and delete that file - ifit exists. If it doesn't exist don't worry about it.


So these bricks have unhealed entries listed

/srv/gfs01/Projects/.glusterfs - 85 files

/srv/gfs05/Projects/.glusterfs  - 58854 files

/srv/gfs06/Projects/.glusterfs- 58854 files

Script time!

On 12/31/18 4:39 AM, Davide Obbi wrote:

cluster.quorum-type auto
cluster.quorum-count (null)
cluster.server-quorum-type off
cluster.server-quorum-ratio 0
cluster.quorum-reads                    no

Where exacty do I remove the gfid entries from - the .glusterfs

directory? --> yes can't remember exactly where but try to do a findin the brick paths with the gfid it should return something

Where do I put the cluster.heal-timeout option - which file? -->gluster volume set volumename option value

On Mon, Dec 31, 2018 at 10:34 AM Brett Holcomb <[email protected]<mailto:[email protected]>> wrote:


    That is probably the case as a lot of files were deleted some time
    ago.

    I'm on version 5.2 but was on 3.12 until about a week ago.

    Here is the quorum info.  I'm running a distributed replicated
    volumes
    in 2 x 3 = 6

    cluster.quorum-type auto
    cluster.quorum-count (null)
    cluster.server-quorum-type off
    cluster.server-quorum-ratio 0
    cluster.quorum-reads                    no

    Where exacty do I remove the gfid entries from - the .glusterfs
    directory?  Do I just delete all the directories can files under this
    directory?

    Where do I put the cluster.heal-timeout option - which file?

    I think you've hit on the cause of the issue.  Thinking back we've
    had
    some extended power outages and due to a misconfiguration in the swap
    file device name a couple of the nodes did not come up and I didn't
    catch it for a while so maybe the deletes occured then.

    Thank you.

    On 12/31/18 2:58 AM, Davide Obbi wrote:
    > if the long GFID does not correspond to any file it could mean the
    > file has been deleted by the client mounting the volume. I think
    this
    > is caused when the delete was issued and the number of active
    bricks
    > were not reaching quorum majority or a second brick was taken down
    > while another was down or did not finish the selfheal, the
    latter more
    > likely.
    > It would be interesting to see:
    > - what version of glusterfs you running, it happened to me with 3.12
    > - volume quorum rules: "gluster volume get vol all | grep quorum"
    >
    > To clean it up if i remember correctly it should be possible to
    delete
    > the gfid entries from the brick mounts on the glusterfs server
    nodes
    > reporting the files to heal.
    >
    > As a side note you might want to consider changing the selfheal
    > timeout to more agressive schedule in cluster.heal-timeout option
    _______________________________________________
    Gluster-users mailing list
    [email protected] <mailto:[email protected]>
    https://lists.gluster.org/mailman/listinfo/gluster-users



--
Davide Obbi
System Administrator

Booking.com B.V.
Vijzelstraat 66-80 Amsterdam 1017HL Netherlands
Direct +31207031558
Booking.com <https://www.booking.com/>
Empowering people to experience the world since 1996

43 languages, 214+ offices worldwide, 141,000+ global destinations, 29million reported listings

Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG)

_______________________________________________
Gluster-users mailing list
[email protected]
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [External] Re: Self Heal Confusion

Reply via email to