Hi,

We have a 3-node gluster setup where gluster is both the server and the
client.
Every few days we have some $random file or directory that does not exist
according to the FUSE mountpoint. When we try to access the file (stat,
cat, etc...) the filesystem reports that the file/directory does not exist,
even though it does. When we try to create the file/directory we receive
the following error which is also logged in
/var/log/glusterfs/bricks/$brick.log:

[2018-04-10 12:51:26.755928] E [MSGID: 113027] [posix.c:1779:posix_mkdir]
0-www-posix: mkdir of /storage/gluster/path/to/dir failed [File exists]

We don't see this issue on all of the servers, but only on the servers that
did not create the file/directory (so 2 of the 3 gluster nodes).

We found that this issue does not resolve itself automatically. However,
when we perform an ls command on the parent directory the issue will be
resolved for the other nodes.

We are running glusterfs 3.12.6 on debian 8

Mount-options in /etc/fstab:
/dev/storage-gluster/gluster /storage/gluster xfs rw,inode64,noatime,nouuid
0 2
localhost:/www /var/www glusterfs
backup-volfile-servers=10.0.0.2:10.0.0.3,log-level=WARNING
0 0

gluster volume info www

Volume Name: www
Type: Replicate
Volume ID: e0579d53-f671-4868-863b-ba85c4cfacb3
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: n01c01-gluster:/storage/gluster/www
Brick2: n02c01-gluster:/storage/gluster/www
Brick3: n03c01-gluster:/storage/gluster/www
Options Reconfigured:
performance.read-ahead: on
performance.client-io-threads: on
nfs.disable: on
transport.address-family: inet
performance.md-cache-timeout: 600
diagnostics.brick-log-level: WARNING
network.ping-timeout: 3
features.cache-invalidation: on
server.event-threads: 4
performance.cache-invalidation: on
performance.quick-read: on
features.cache-invalidation-timeout: 600
network.inode-lru-limit: 90000
performance.cache-priority: *.php:3,*.temp:3,*:1
performance.nl-cache: on
performance.cache-size: 1GB
performance.readdir-ahead: on
performance.write-behind: on
cluster.readdir-optimize: on
performance.io-thread-count: 64
client.event-threads: 4
cluster.lookup-optimize: on
performance.parallel-readdir: off
performance.write-behind-window-size: 4MB
performance.flush-behind: on
features.bitrot: on
features.scrub: Active
performance.io-cache: off
performance.stat-prefetch: on

We suspected that the md-cache could be the cause, but it does have a
timeout of 600 seconds so this would be strange since the issue can be
present for hours (at which point we did an ls to fix it).

Does anyone have an idea of what could be the cause of this?

Thanks!
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Reply via email to