Hi Ralph, > I don't know what version of OMPI you're working with, so I can't > precisely pinpoint the line in question. However, it looks likely to > be an error caused by not finding the PBS nodefile.
This is openmpi 1.6.5. > We look in the environment for PBS_NODEFILE to find the directory > where the file should be found, and then look for a file named with > our Torque-assigned jobid in that place. The open failure indicates > that it isn't there or isn't readable by us. Does that mean that I misunderstand the --with-libpci switch for hwloc and --enable-cpuset for torque? I had thought that this eliminates the need for $PBS_NODEFILE. > If you are on a network file system, then it's possible that Torque > is creating the file on your server, but the compute node just isn't > seeing it fast enough. You might look at potential NFS setup switches > to speed-up the sync. Indeed the compute nodes are NFS-mounted. I'll take a look at sync parameters. Thanks for the pointer. Cheers, Andrej