I have not looked at the code in a long time, so not sure how many things have 
changed ...  In general what you are suggesting is reasonable.  However, 
especially on large machines you also need to worry about memory locality, so 
should allocate from memory pools that are appropriately located.  I expect 
that memory allocated on a per-socket basis would do.  Having said that, I have 
no clue how easy this is to implement within the current code base, but expect 
you can rely on first-touch after the procs are locked down to simplify the 
implementation.

Rich

----- Original Message -----
From: devel-boun...@open-mpi.org <devel-boun...@open-mpi.org>
To: de...@open-mpi.org <de...@open-mpi.org>
Sent: Fri Aug 29 20:52:10 2008
Subject: [OMPI devel] allocating sm memory with page alignment

(I'm new to Open MPI.)

I'm looking at the sm BTL.

In mca_btl_sm_add_procs(), there's a loop over peer processes, with a 
call to ompi_fifo_init().  That is, one call to ompi_fifo_init() for 
each connection (sender/receiver pair).

In ompi_fifo_init(), there's an allocation of 
sizeof(ompi_cb_fifo_wrapper_t), and a call to ompi_cb_fifo_init(), which 
in turn has two allocations:  one of a bunch of pointers and another of 
sizeof(ompi_cb_fifo_ctl_t).

In short, for each connection, there are three allocations:

*) sizeof(ompi_cb_fifo_wrapper_t)... about 64 bytes on LP64
*) a bunch of pointers... about 1 Kbyte on LP64
*) sizeof(ompi_cb_fifo_ctl_t)... about 12 bytes

Let me say this yet another way.  For N local processes, there are 
N*(N-1) per-connection allocations, most of which are 64 bytes or smaller.

BUT, in ompi_fifo_init() and ompi_cb_fifo_init(), we ask for page 
alignment of each allocation.  Further, in mca_mpool_sm_alloc() that 
alignment is further reinforced to be on page boundaries.

As the number of local processes increases, therefore these 
per-connection allocations become very costly.  For 8K pages, for 
example, and 100 on-node processes, we're talking 3*100*100*8K = 240 
Mbytes.  For 512 on-node processes (yes, we have nodes this big), that's 
6 Gbyte... most of which is unused.  (E.g., allocating more than an 8K 
page when we only need 64 or 12 bytes.)

Okay, long intro.  Let me start with a short question:  do we really 
need page alignment for these allocations?  Would cacheline alignment be 
okay?

(I imagine I'll have follow-up questions once the answers start to roll in.)
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to