Hi folks We seem to have hit a problem here - it looks like we are seeing a built-in limit on the number of communicators one can create in a program. The program basically does a loop, calling MPI_Comm_split each time through the loop to create a sub-communicator, does a reduce operation on the members of the sub-communicator, and then calls MPI_Comm_free to release it (this is a minimized reproducer for the real code). After 64k times through the loop, the program fails.
This looks remarkably like a 16-bit index that hits a max value and then blocks. I have looked at the communicator code, but I don't immediately see such a field. Is anyone aware of some other place where we would have a limit that would cause this problem? Thanks Ralph