There recently was activity on the mailing lists where someone was
attempting to call comm_spawn 100,000 times. Setting aside the
threading issues that were the focus of that exchange, the fact is
that OMPI currently cannot handle that many comm_spawns.
The ORTE jobid is composed of two elements:
1. the top 16-bits is an "identifier" for that mpirun
2. the lower 16-bits is a running counter identifying the specific job/
launch for those procs.
Thus, we are limited to 64k comm_spawns.
Expanding this would require either revamping the entire way we handle
jobs (e.g., removing the mpirun identifier - major effort), or
expanding the orte_jobid_t from its current 32-bits to 64-bits.
Is this a problem we want to address?
Ralph