Hi folks For those interested in trying it, I completed backporting the multicast grpcomm module from my branch over the last weekend. This allows all modex and other ORTE-level collective operations to occur via multicast, which significantly improves the performance of those operations.
In order to use it, you'll need to add --enable-multicast to your configure, and -mca grpcomm mcast to your cmd line. You'll also need a reasonably good udp multicast environment. The new module will work with any launch environment. I'm not really focused on scalability in my branch (mostly on resilience), but I did some quick experiments and found that the new module reduced modex time by quite a bit, depending on system and scale of course. I hope to finish my backport over the next week or so - the last part will enable ALL orte system operations to be done via multicast. This eliminates things like the initial TCP connection flood back to the HNP when the daemons are launched. Again, I don't focus much on scalability, so anyone wanting to test that capability at scale will be welcome. I'll send out another note when it is ready. Ralph