Hi,

I've been trying to set up mpiBLAST-pio on an Opteron cluster that uses
NFS.  The system uses MPICH2 for communication.

When I run 1 or 2 nodes, everything is fine.  Everything is also fine if I
use --use-master-write.  Otherwise, I get:

File locking failed in ADIOI_Set_lock. If the file system is NFS, you need
to use NFS version 3, ensure that the lockd daemon is running on all the
machines, and mount the directory with the 'noac' option (no attribute
caching).
p3_28926:  p4_error: : 1
[3] MPI Abort by user Aborting program !
[3] Aborting program!
rm_l_3_28927: (11.500000) net_send: could not write to fd=5, errno = 32
p2_22684:  p4_error: interrupt SIGx: 13
lcan233:/vol/xcae2/mlundbe $ 3  11.4648 Bailing out with signal -1
[3] MPI Abort by user Aborting program !
[3] Aborting program!
1       11.4688 Bailing out with signal -1
[1] MPI Abort by user Aborting program !
[1] Aborting program!
p1_5504:  p4_error: : 0
p2_22684: (17.578125) net_send: could not write to fd=5, errno = 32
p1_5504: (17.648438) net_send: could not write to fd=5, errno = 32
2       17.6133 Bailing out with signal -1
[2] MPI Abort by user Aborting program !
[2] Aborting program!
Abort


The first error message is from the file I/O part of MPICH2, and a lot of
people on the internet have run into it.  Apparently locking with the fcntl
command doesn't work on NFS unless it's set up as described.  Now, I'm not
an admin on this cluster, so I haven't been able to really try a lot of
things out. What I have determined, though, is that we do run NFS version 3
and the lockd daemon is always running by default.  An admin said he put
the 'noac' option in the autofs command that mounts the relevant
directories, but that may not have worked because it no longer shows up
when I grep for it in the mtab.  This something I'll follow up on tomorrow.

Any advice would be appreciated.

Thanks,
Marcus Lundberg


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Mpiblast-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mpiblast-users

Reply via email to