Hi Ralph,

> I don't know what version of OMPI you're working with, so I can't
> precisely pinpoint the line in question. However, it looks likely to
> be an error caused by not finding the PBS nodefile.

This is openmpi 1.6.5.

> We look in the environment for PBS_NODEFILE to find the directory
> where the file should be found, and then look for a file named with
> our Torque-assigned jobid in that place. The open failure indicates
> that it isn't there or isn't readable by us.

Does that mean that I misunderstand the --with-libpci switch for hwloc
and --enable-cpuset for torque? I had thought that this eliminates the
need for $PBS_NODEFILE.

> If you are on a network file system, then it's possible that Torque
> is creating the file on your server, but the compute node just isn't
> seeing it fast enough. You might look at potential NFS setup switches
> to speed-up the sync.

Indeed the compute nodes are NFS-mounted. I'll take a look at sync
parameters. Thanks for the pointer.

Cheers,
Andrej

Reply via email to