Hi, I've been trying to set up mpiBLAST-pio on an Opteron cluster that uses NFS. The system uses MPICH2 for communication.
When I run 1 or 2 nodes, everything is fine. Everything is also fine if I use --use-master-write. Otherwise, I get: File locking failed in ADIOI_Set_lock. If the file system is NFS, you need to use NFS version 3, ensure that the lockd daemon is running on all the machines, and mount the directory with the 'noac' option (no attribute caching). p3_28926: p4_error: : 1 [3] MPI Abort by user Aborting program ! [3] Aborting program! rm_l_3_28927: (11.500000) net_send: could not write to fd=5, errno = 32 p2_22684: p4_error: interrupt SIGx: 13 lcan233:/vol/xcae2/mlundbe $ 3 11.4648 Bailing out with signal -1 [3] MPI Abort by user Aborting program ! [3] Aborting program! 1 11.4688 Bailing out with signal -1 [1] MPI Abort by user Aborting program ! [1] Aborting program! p1_5504: p4_error: : 0 p2_22684: (17.578125) net_send: could not write to fd=5, errno = 32 p1_5504: (17.648438) net_send: could not write to fd=5, errno = 32 2 17.6133 Bailing out with signal -1 [2] MPI Abort by user Aborting program ! [2] Aborting program! Abort The first error message is from the file I/O part of MPICH2, and a lot of people on the internet have run into it. Apparently locking with the fcntl command doesn't work on NFS unless it's set up as described. Now, I'm not an admin on this cluster, so I haven't been able to really try a lot of things out. What I have determined, though, is that we do run NFS version 3 and the lockd daemon is always running by default. An admin said he put the 'noac' option in the autofs command that mounts the relevant directories, but that may not have worked because it no longer shows up when I grep for it in the mtab. This something I'll follow up on tomorrow. Any advice would be appreciated. Thanks, Marcus Lundberg ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier. Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Mpiblast-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/mpiblast-users
