Healing time set to 120 seconds for now.
Just to make sure I understand I need to take the result of the gluster
volume heal projects info and put it in a file. Then try and find each
guid listed in that file in the .glusterfs directory for each brick
listed in the output as having unhealed files and delete that file - if
it exists. If it doesn't exist don't worry about it.
So these bricks have unhealed entries listed
/srv/gfs01/Projects/.glusterfs - 85 files
/srv/gfs05/Projects/.glusterfs - 58854 files
/srv/gfs06/Projects/.glusterfs- 58854 files
Script time!
On 12/31/18 4:39 AM, Davide Obbi wrote:
cluster.quorum-type auto
cluster.quorum-count (null)
cluster.server-quorum-type off
cluster.server-quorum-ratio 0
cluster.quorum-reads no
Where exacty do I remove the gfid entries from - the .glusterfs
directory? --> yes can't remember exactly where but try to do a find
in the brick paths with the gfid it should return something
Where do I put the cluster.heal-timeout option - which file? -->
gluster volume set volumename option value
On Mon, Dec 31, 2018 at 10:34 AM Brett Holcomb <[email protected]
<mailto:[email protected]>> wrote:
That is probably the case as a lot of files were deleted some time
ago.
I'm on version 5.2 but was on 3.12 until about a week ago.
Here is the quorum info. I'm running a distributed replicated
volumes
in 2 x 3 = 6
cluster.quorum-type auto
cluster.quorum-count (null)
cluster.server-quorum-type off
cluster.server-quorum-ratio 0
cluster.quorum-reads no
Where exacty do I remove the gfid entries from - the .glusterfs
directory? Do I just delete all the directories can files under this
directory?
Where do I put the cluster.heal-timeout option - which file?
I think you've hit on the cause of the issue. Thinking back we've
had
some extended power outages and due to a misconfiguration in the swap
file device name a couple of the nodes did not come up and I didn't
catch it for a while so maybe the deletes occured then.
Thank you.
On 12/31/18 2:58 AM, Davide Obbi wrote:
> if the long GFID does not correspond to any file it could mean the
> file has been deleted by the client mounting the volume. I think
this
> is caused when the delete was issued and the number of active
bricks
> were not reaching quorum majority or a second brick was taken down
> while another was down or did not finish the selfheal, the
latter more
> likely.
> It would be interesting to see:
> - what version of glusterfs you running, it happened to me with 3.12
> - volume quorum rules: "gluster volume get vol all | grep quorum"
>
> To clean it up if i remember correctly it should be possible to
delete
> the gfid entries from the brick mounts on the glusterfs server
nodes
> reporting the files to heal.
>
> As a side note you might want to consider changing the selfheal
> timeout to more agressive schedule in cluster.heal-timeout option
_______________________________________________
Gluster-users mailing list
[email protected] <mailto:[email protected]>
https://lists.gluster.org/mailman/listinfo/gluster-users
--
Davide Obbi
System Administrator
Booking.com B.V.
Vijzelstraat 66-80 Amsterdam 1017HL Netherlands
Direct +31207031558
Booking.com <https://www.booking.com/>
Empowering people to experience the world since 1996
43 languages, 214+ offices worldwide, 141,000+ global destinations, 29
million reported listings
Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG)
_______________________________________________
Gluster-users mailing list
[email protected]
https://lists.gluster.org/mailman/listinfo/gluster-users