Just for the sake of completeness, when the test program fails in the expected fashion this is the message it prints:

Opening file 'read' in /gpfs/aaronFS/testFile mode. stride = 1048576 l_len = 262144
Non-zero return from fcntl. errno = 37 (No locks available)
Aborted

-Aaron

On 12/6/18 1:47 PM, Aaron Knister wrote:
I've been trying to chase down an error one of our users periodically sees with Intel MPI. The body of the error is this:

This requires fcntl(2) to be implemented. As of 8/25/2011 it is not. Generic MPICH Message: File locking failed in ADIOI_Set_lock(fd F,cmd F_SETLKW/7,type F_RDLCK/0,whence 0) with return value FFFFFFFF and errno 25. - If the file system is NFS, you need to use NFS version 3, ensure that the lockd daemon is running on all the machines, and mount the directory with the 'noac' option (no attribute caching). - If the file system is LUSTRE, ensure that the directory is mounted with the 'flock' option.
ADIOI_Set_lock:: No locks available
ADIOI_Set_lock:offset 0, length 8

When this happens, a new job is reading back-in the checkpoint files a previous job wrote. Consistently it's the reading in of previously written files that triggers this although the occurrence is sporadic and if the job retries enough times the error will go away.

The really curious thing, is there is only one byte range lock per file per-node open at any time, so the error 37 (I know it says 25 but that's actually in hex even though it's not prefixed with 0x) of being out of byte range locks is a little odd to me. The default is 200 but we should be no way near that.

I've been trying to frantically chase this down with various MPI reproducers but alas I came up short, until this morning, when I gave up on the MPI approach and tried something a little more simple. I've discovered that when:

- A file is opened by node A (a key requirement to reproduce seems to be that node A is *also* the metanode for the file. I've not been able to reproduce if node A is *not* the metanode)
- Node A Acquires a bunch of write locks in the file
- Node B then also acquires a bunch of write locks in the file
- Node B then acquires a bunch of read locks in the file
- Node A then also acquires a bunch of read locks in the file

At that last step, Node A will experience the errno 37 attempting to acquire read locks.

Here are the actual commands to reproduce this (source code for fcntl_stress.c is attached):

Node A: rm /gpfs/aaronFS/testFile; dd if=/dev/zero of=/gpfs/aaronFS/testFile bs=1M count=4000 Node A: ./fcntl_stress /gpfs/aaronFS/testFile $((1024*1024)) $((256*1024)) 1 Node B: ./fcntl_stress /gpfs/aaronFS/testFile $((1024*1024)) $((256*1024)) 1
Node B: ./fcntl_stress /gpfs/aaronFS/testFile $((1024*1024)) $((256*1024))
Node A: ./fcntl_stress /gpfs/aaronFS/testFile $((1024*1024)) $((256*1024))

Now that I've typed this out, I realize this really should be a PMR not a post to the mailing list :) but I thought it was interesting and wanted to share.

-Aaron


--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to