Re: [OMPI devel] allocating sm memory with page alignment

Eugene Loh Sat, 30 Aug 2008 13:54:16 -0400

Jeff Squyres wrote:

On Aug 29, 2008, at 5:52 PM, Eugene Loh wrote:
I'm looking at the sm BTL.
Excellent! I hope you had a good dash of parmesan with thatspaghetti code in there (the sm btl is among the hairiest sectionsin OMPI...). :-)

There's probably some law of software engineering that applies here.Basically, upon first read, I was filled with bitter resentment againstthose who had written the code. :^) Then, as I began to feel masteryover its, um, intricacies -- to feel that I, too, was becoming a memberof the inner cabal -- I began to feel pride and a desire to protect thecode against intrusive overhaul. :^)

I did peek at some of the Open MPI papers, and they talked about OpenMPI's modular design. The idea is that someone should be able to playwith one component of the architecture without having to become anexpert in the whole thing. The reality I seem to be facing is that tounderstand one part (like the sm BTL), I have to understand many parts(mpool, allocator, common, etc.) and the only way to do so is to readcode, step through with debugger, and ask experts.

I believe the main rationale for doing page-line alignments was formemory affinity, since (at least on Linux, I don't know aboutsolaris) you can only affinity-ize pages.


Solaris maps on a per-page basis.

On your big 512 proc machines, I'm assuming that the page memoryaffinity will matter...?

You mean for latency? I could imagine so, but don't know for sure. I'mno expert on this stuff. Theoretically, I could imagine a system wheresome of this stuff might fly from cache-to-cache, with the location ofthe backing memory not being relevent.

If locality did matter, I could imagine two reasonable choices: FIFOsbeing local to the sender or to the receiver -- with the best choicedepending on the system.

That being said, we're certainly open to making things better. E.g.,if a few procs share a memory locality (can you detect that inSolaris?), have them share a page or somesuch...?


Yes, I believe you can detect these things in Solaris.

I could imagine splitting the global shared memory segment up perprocess. This might have two advantages:

*) If the processes are bound and there is some sort of first-touchpolicy, you could manage memory locality just by having the rightprocess make the allocation. No need for page alignment of tinyallocations.

*) You wouldn't need to control memory allocations with a lock (exceptfor multithreaded apps). I haven't looked at this too closely yet, butthe 3*n*n memory allocations in shared memory during MPI_Init arecurrently serialized, which sounds disturbing when n is 100 to 500 localprocesses.


Graham, Richard L. wrote:

I have not looked at the code in a long time, so not sure how many things have 
changed ...  In general what you are suggesting is reasonable.  However, 
especially on large machines you also need to worry about memory locality, so 
should allocate from memory pools that are appropriately located.  I expect 
that memory allocated on a per-socket basis would do.

Is this what "maffinity" and "memory nodes" are about? If so, I wouldthink memory locality should be handled there rather than in pagealignment of individual 12-byte and 64-byte allocations.

Having said that, I have no clue how easy this is to implement within the 
current code base,

Yeah, we start in the sm BTL, but then go into mpool, class, common, andallocator.

but expect you can rely on first-touch after the procs are locked down to 
simplify the implementation.

Thanks for the discussion and insights.

Re: [OMPI devel] allocating sm memory with page alignment

Reply via email to