Re: [OMPI devel] allocating sm memory with page alignment

Eugene Loh Tue, 2 Sep 2008 20:18:11 -0400

Jeff Squyres wrote:

I think even first-touch will make *the whole page* be local to theprocess that touches it.


Right.

So if you have each process take N bytes (where N << page_size), thenthe 0th process will make that whole page be local; it may be remotefor others.


I think I'm not making myself clear.  Read on...

*) You wouldn't need to control memory allocations with a lock(except for multithreaded apps). I haven't looked at this tooclosely yet, but the 3*n*n memory allocations in shared memoryduring MPI_Init are currently serialized, which sounds disturbingwhen n is 100 to 500 local processes.
If I'm understanding your proposal right, you're saying that eachprocess would create its own shared memory space, right? Then anyother process that wants to send to that process would mmap/shmattach/whatever to the receiver's shared memory space. Right?

I don't think it's necessary to have each process have its own segment.The OS manages the shared area on a per-page basis anyhow. All that'snecessary is that there is an agreement up front about which pages willbe local to which process. E.g., if there are P processes/processorsand the shared area has M pages per process, then there will be P*Mpages altogether. We'll say that the first M pages are local to process0, then next m to process 1, etc. That is, process 0 will first-touchthe first M pages, process 1 will first-touch the next M pages, etc. Ifan allocation needs to be local to process i, then process i willallocate it from its pages. Since only process i can allocate fromthese pages, it does not need any lock protection to keep otherprocesses from allocating at the same time. And, since these pages havethe proper locality, then small allocations can all share common pages(instead of having a separate page for each 12-byte or 64-byte allocation).

Clearer? One shared memory region, partitioned equally among allprocesses. Each process first-touches its own pages to get the rightlocality. Each allocation made by the process to whom it should belocal. Benefits include no multi-process locks and no need for pagealignment of tiny allocations.

The total amount of shared memory will likely not go down, becausethe OS will still likely allocate on a per-page basis, right?

Total amount would go down significantly. Today, if you want toallocate 64 bytes on a page boundary, you allocate 64+pagesize, a 100xoverhead. With what I'm (evidently not so clearly) proposing is that weestablish a policy about what memory will be local to whom. With thatpolicy, we simply allocate our 64 bytes in the appropriate region. Thiseliminates the need for page alignment (page is already in the rightplace, shared by many allocations all of whom want to be there). Youcould still want cacheline alignment... that's fine.

But per your 2nd point, would the resources required for each processto mmap/ shmattach/whatever 511 other process' shared memory spacesbe prohibitive?

No need to have more shared memory segments. Just need a policy to sayhow your global space is partitioned.

Graham, Richard L. wrote:
I have not looked at the code in a long time, so not sure how manythings have changed ... In general what you are suggesting isreasonable. However, especially on large machines you also need toworry about memory locality, so should allocate from memory poolsthat are appropriately located. I expect that memory allocated ona per-socket basis would do.
Is this what "maffinity" and "memory nodes" are about? If so, Iwould think memory locality should be handled there rather than inpage alignment of individual 12-byte and 64-byte allocations.
maffinity was a first stab at memory affinity and is currently (andhas been for a long, long time) no frills and didn't have a lot ofthought put into it.
I see the "node id" and "bind" functions in there; I think Gleb musthave added them somewhere along the way. I'm not sure how muchthought was put into making those be truly generic functions (I seethem implemented in libnuma, which AFAIK is Linux-specific). DoesSolaris have memory affinity function calls?


Yes, I believe so, though perhaps I don't understand your question.

Things like mbind() and numa_setlocal_memory() are, I assume, Linuxcalls for placing some memory close to a process. I think the Solarismadvise() call does this: give a memory range and say something abouthow that memory should be placed -- e.g., the memory should be placedlocal to the next thread to touch that memory. Anyhow, I think thedefault policy is "first touch", so one could always do that.

I'm not an expert on this stuff, but I just wanted to reassure you thatSolaris supports NUMA programming. There are interfaces for discoveringthe NUMA topology of a machine (there is a hierarchy of "localitygroups", each containing CPUs and memory), for discovering in whichlocality group you are, for advising the VM system where you want memoryplaced, and for querying where certain memory is. I could do morehomework on these matters if it'd be helpful.

Re: [OMPI devel] allocating sm memory with page alignment

Reply via email to