Re: [OMPI devel] Comm_spawn limits

Jeff Squyres Mon, 27 Oct 2008 18:14:16 -0400

On Oct 27, 2008, at 5:52 PM, Andreas Schäfer wrote:

I don't know any implementation details, but is making a 16-bit
counter a 32-bit counter really so much harder than this fancy
(overengineered? ;-) ) table construction? The way I see it, this
table which might become a real mess if there are multiple

MPI_Comm_spawn issued simultaneously in different communicators.(Would

that be legal MPI?)

FWIW, all the spawns are proxied back to the HNP (i.e., mpirun), sothere would only be a need for 1 table. I don't think that a simpletable lookup is overengineered. :-) It's a simple solution to the"need a global ID" issue. By limiting the size of the table, you canavoid scalability issues as MPI jobs are being run on more and morecores (e.g., growing without bound, particularly for 99% of the appsout there that never call comm_spawn).

We actually went down to 16 bits recently (it used to be 32) as oneitem toward reducing the memory footprint of MPI processes (and mpirunand the orted's), particularly when running very large scale jobs. Sowhile increasing this one value back to 32 bits may not be tragic, itwould be nice to keep it down as 16 bits (IMHO).

Regardless of how big the value is (8, 16, 32, 64...) you still need aunique value for comm_spawn. Therefore, some kind of duplicatedetection mechanism is needed. If you increase the size of thecontainer, you decrease the probability of collision, but it can stillhappen. And since machines are growing in size and # of cores, itcould just delay the probability of collision until someone runs on abig enough machine. Regardless, I'd prefer to fix it the Right wayrather than rely on probability to prevent a problem. In myexperience, "that could *never* happen!" is just an invitation for adisaster, even if it's 1-5 years in the future. (didn't someone saythat we'd never need more than 640k of RAM? :-) )


Just my IMHO, of course... (and I'm not the guy writing the code!)  :-)

--
Jeff Squyres
Cisco Systems

Re: [OMPI devel] Comm_spawn limits

Reply via email to