On Mon, Apr 16, 2018 at 1:54 PM, Niels Hendriks <ni...@nuvini.com> wrote:
> Hi, > > We have a 3-node gluster setup where gluster is both the server and the > client. > Every few days we have some $random file or directory that does not exist > according to the FUSE mountpoint. When we try to access the file (stat, > cat, etc...) the filesystem reports that the file/directory does not exist, > even though it does. When we try to create the file/directory we receive > the following error which is also logged in > /var/log/glusterfs/bricks/$brick.log: > > [2018-04-10 12:51:26.755928] E [MSGID: 113027] [posix.c:1779:posix_mkdir] > 0-www-posix: mkdir of /storage/gluster/path/to/dir failed [File exists] > > We don't see this issue on all of the servers, but only on the servers that > did not create the file/directory (so 2 of the 3 gluster nodes). > > We found that this issue does not resolve itself automatically. However, > when we perform an ls command on the parent directory the issue will be > resolved for the other nodes. > > We are running glusterfs 3.12.6 on debian 8 > > Mount-options in /etc/fstab: > /dev/storage-gluster/gluster /storage/gluster xfs rw,inode64,noatime,nouuid > 0 2 > localhost:/www /var/www glusterfs > backup-volfile-servers=10.0.0.2:10.0.0.3,log-level=WARNING > 0 0 > > gluster volume info www > > Volume Name: www > Type: Replicate > Volume ID: e0579d53-f671-4868-863b-ba85c4cfacb3 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: n01c01-gluster:/storage/gluster/www > Brick2: n02c01-gluster:/storage/gluster/www > Brick3: n03c01-gluster:/storage/gluster/www > Options Reconfigured: > performance.read-ahead: on > performance.client-io-threads: on > nfs.disable: on > transport.address-family: inet > performance.md-cache-timeout: 600 > diagnostics.brick-log-level: WARNING > network.ping-timeout: 3 > features.cache-invalidation: on > server.event-threads: 4 > performance.cache-invalidation: on > performance.quick-read: on > features.cache-invalidation-timeout: 600 > network.inode-lru-limit: 90000 > performance.cache-priority: *.php:3,*.temp:3,*:1 > performance.nl-cache: on > performance.cache-size: 1GB > performance.readdir-ahead: on > performance.write-behind: on > cluster.readdir-optimize: on > performance.io-thread-count: 64 > client.event-threads: 4 > cluster.lookup-optimize: on > performance.parallel-readdir: off > performance.write-behind-window-size: 4MB > performance.flush-behind: on > features.bitrot: on > features.scrub: Active > performance.io-cache: off > performance.stat-prefetch: on > > We suspected that the md-cache could be the cause, but it does have a > timeout of 600 seconds so this would be strange since the issue can be > present for hours (at which point we did an ls to fix it). > > Does anyone have an idea of what could be the cause of this? > For files, it could be because of: * cluster.lookup-optimize set to on * datafile is present on non hashed subvol, but linkto file is absent on hashed subvol I see that lookup-optimize is on. Can you get the following information from problematic file? * Name of the file * all xattrs on parent directory from all bricks * stat of file from all bricks where it is present. * all xattrs on file from all bricks where it is present. If you are seeing the problem on directory, * Name of directory * xattr of directory and its parent from all bricks regards, Raghavendra > Thanks! > > > _______________________________________________ > Gluster-users mailing list > Gluster-users@gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users >
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users