Thanks Ralph, this indeed fixed my problem. However, I run in more troubles ...

I have a simple application that keep spawning MPI processes, exchange some data and then the children disconnect and vanish. But I keep doing this in a loop ... absolutely legal from the MPI standard perspective. However, with Open MPI trunk I run in two kinds of troubles:

1. I run out of fds. Apparently the orteds don't close the connections when the children disconnect, and after few iterations I exhaust the available fd, the orted start complaining and everything end up being killed. If I check with lsof I can see the pending fd (in an invalid state) but still attached to the orted.

2. I tried to be helpful and provide a host file describing the cluster. I even annotate the nodes with he number of slots and max- slots. When we spawn processes we correctly load balance them on the available nodes, but when they finish we do not release the resources. After few iterations we run out of available nodes, and the application exit with the following error:
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 2 slots
that were requested by the application:
  ./slave

Either request fewer slots for your application, or make more slots available
for use.
--------------------------------------------------------------------------

However, at this point there is only one MPI process running, the master. All other resources are fully available for the children.

I would like to get involved in this and help fix the two problems. But I have a hard time figuring out where to start. Any pointers will be welcomed.

  Thanks,
    george.

On Oct 28, 2008, at 10:50 AM, Ralph Castain wrote:

Done...r19820

On Oct 28, 2008, at 8:37 AM, Ralph Castain wrote:

Yes, of course it does - the problem is in a sanity check I just installed over the weekend.

Easily fixed...


On Oct 28, 2008, at 8:33 AM, George Bosilca wrote:

Ralph,

I run in troubles with the new IO framework when I spawn a new process. The following error message is dumped and the job is aborted.

--------------------------------------------------------------------------
The requested stdin target is out of range for this job - it points
to a process rank that is greater than the number of process in the
job.

Specified target: INVALID
Number of procs: 2

This could be caused by specifying a negative number for the stdin
target, or by mistyping the desired rank. Please correct the cmd line
and try again.
--------------------------------------------------------------------------

Is the new IO framework supposed to support MPI2 dynamics ?

Thanks,
 george.

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to