On 04/19/2017 07:58 PM, Prentice Bisbal wrote: > Here's the sequence of events: > > 1. First job(s) run fine on the node and complete without error. > > 2. Eventually a job fails with a 'permission denied' error when it tries > to access /l/hostname.
So you don't get ESTALE, but you get EACCESS? You *might* be able to fix this by setting the 'no_subtree_check' in your /etc/exports. I don't remember the details exactly anymore, but nfsd/exportfs check more intensively if a dentry is valid if this option is not given. I don't think that networking can be a cause for this, but if a dentry/inode is evicted from the server side cache, the NFS file handle has to be used to create inode and dentry on the server side on the underlying file system. I think EACCESS is then used if something goes wrong connecting the dentry to the parent-dentry (I need to look up the exact details again, it's been while I had to deal with this). You could try to set /proc/sys/vm/vfs_cache_pressure to a very low value (don't set it to 0, though). Depending on your file system and kernel version this might help to keep dentries/inode in the cache and to avoid running into this (there was bug until 3.10, which prevented that this worked properly, I'm not sure if the related patch series has been backported into vendor kernels). Btw, which kernel version and file system is your nfs server running on? Bernd _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf