I've seen this now and again and we don't know, for sure, what the
problem is.  My best guess is that this is triggered by some sort
of OS limit on sockets for some OS's.  When MPICH-G2 starts up,
in MPI_Init(), *every* process starts with a listen() (specifying backlog=1)
on an ephemeral port (one assigned by OS) but "connect" is only done
on-demand.

A simple test would be to write a small program that only does a listen
(meticulously checking the error code for every OS call) and then perhaps
sleeps for a couple of minutes.  Use Globus to launch the application
increasing the (count=xxx) value to 337 or higher *on the same system*
you're seeing the MPICH-G2 problem and my best guess is that you'll
trigger the same error.  If so, then that's the problem (hitting an
OS limit) and one possible solution is to see if that limit can be
increased by the sys admin.

Nick

On Jun 8, 2007, at 2:16 PM, [EMAIL PROTECTED] wrote:

Hello,

Reposting, maybe someone has encountered this before... On two different clusters, one with Fedora and another one with Debian it seems I cannot launch more than __337__ MPICH-G2 processes (using only inside-cluster machines). It works with simple programs but not with MPICH-G2 compiled programs (as simple as just an MPI_Init + MPI_Finalize, nothing more).

Is there anyone who could INFIRM this (someone currently launching more than 337 MPICH-G2 processes) or, why not, confirm this? Is there any related cause to this? Could someone offer some clues?

Have a nice day, everyone!

Alex



----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

Reply via email to