Hi folks

For those interested in trying it, I completed backporting the multicast 
grpcomm module from my branch over the last weekend. This allows all modex and 
other ORTE-level collective operations to occur via multicast, which 
significantly improves the performance of those operations.

In order to use it, you'll need to add --enable-multicast to your configure, 
and -mca grpcomm mcast to your cmd line. You'll also need a reasonably good udp 
multicast environment. The new module will work with any launch environment.

I'm not really focused on scalability in my branch (mostly on resilience), but 
I did some quick experiments and found that the new module reduced modex time 
by quite a bit, depending on system and scale of course.

I hope to finish my backport over the next week or so - the last part will 
enable ALL orte system operations to be done via multicast. This eliminates 
things like the initial TCP connection flood back to the HNP when the daemons 
are launched. Again, I don't focus much on scalability, so anyone wanting to 
test that capability at scale will be welcome. I'll send out another note when 
it is ready.

Ralph


Reply via email to