On Aug 29, 2008, at 5:52 PM, Eugene Loh wrote:

I'm looking at the sm BTL.

Excellent! I hope you had a good dash of parmesan with that spaghetti code in there (the sm btl is among the hairiest sections in OMPI...). :-)

In mca_btl_sm_add_procs(), there's a loop over peer processes, with a call to ompi_fifo_init(). That is, one call to ompi_fifo_init() for each connection
[snip]
on page boundaries.

I *believe* your analysis is correct. It's been a while since I've looked in detail in that section of code, but what you say sounds reasonable.

As the number of local processes increases, therefore these per- connection allocations become very costly. For 8K pages, for example, and 100 on-node processes, we're talking 3*100*100*8K = 240 Mbytes. For 512 on-node processes (yes, we have nodes this big), that's 6 Gbyte... most of which is unused. (E.g., allocating more than an 8K page when we only need 64 or 12 bytes.)

Okay, long intro. Let me start with a short question: do we really need page alignment for these allocations? Would cacheline alignment be okay?

I believe the main rationale for doing page-line alignments was for memory affinity, since (at least on Linux, I don't know about solaris) you can only affinity-ize pages.

On your big 512 proc machines, I'm assuming that the page memory affinity will matter...?

That being said, we're certainly open to making things better. E.g., if a few procs share a memory locality (can you detect that in Solaris?), have them share a page or somesuch...? (totally open to ideas here)

--
Jeff Squyres
Cisco Systems

Reply via email to