I found the bug, it was me. After all I somehow missed to actually provide the MPI_Info argument to the spawn call. Instead I provided MPI_INFO_NULL.
My apologies for this mistake. Thank you for your efforts. Sebastian On Mar 22, 2013, at 1:10 PM, Sebastian Rinke wrote: > Thanks for the quick response. > >>> I'm using OMPI 1.6.4 in a Torque-like environment. >>> However, since there are modifications in Torque that prevent OMPI from >>> spawning processes the way it does with MPI_COMM_SPAWN, >> >> That hasn't been true in the past - did you folks locally modify Torque to >> prevent it? > > Plain Torque still supports the TM-based spawning as before. > The Problem is that the RM on the system I'm using is based on Torque with > modifications. > >>> I want to circumvent Torque and use plain ssh only. >>> >>> So, I configured --without-tm and can successfully run mpiexec with >>> -hostfile. >>> >>> Now I want to MPI_COMM_SPAWN using the hostfile info argument. >>> >>> I start with >>> >>> $ mpiexec -np 1 -hostfile hostfile_all ./spawn_parent >>> >>> where hostfile_all is a superset of hostfile_spawn which is provided in the >>> info argument to MPI_COMM_SPAWN. >>> >>> The message I get is: >>> >>> -------------------------------------------------------------------------- >>> All nodes which are allocated for this job are already filled. >>> -------------------------------------------------------------------------- >> >> I'll take a look in the morning when my cluster comes back up - sounds like >> we have a bug. However, note that there are no current plans for a 1.6.5 >> release, so I don't know how long it will be before any fix shows up. >> >> Meantime, I'll check the 1.7 series to ensure it works correctly there as >> well. > > When it works with 1.7 this would already be fine for me. > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel