I have not looked at the code in a long time, so not sure how many things have changed ... In general what you are suggesting is reasonable. However, especially on large machines you also need to worry about memory locality, so should allocate from memory pools that are appropriately located. I expect that memory allocated on a per-socket basis would do. Having said that, I have no clue how easy this is to implement within the current code base, but expect you can rely on first-touch after the procs are locked down to simplify the implementation.
Rich ----- Original Message ----- From: devel-boun...@open-mpi.org <devel-boun...@open-mpi.org> To: de...@open-mpi.org <de...@open-mpi.org> Sent: Fri Aug 29 20:52:10 2008 Subject: [OMPI devel] allocating sm memory with page alignment (I'm new to Open MPI.) I'm looking at the sm BTL. In mca_btl_sm_add_procs(), there's a loop over peer processes, with a call to ompi_fifo_init(). That is, one call to ompi_fifo_init() for each connection (sender/receiver pair). In ompi_fifo_init(), there's an allocation of sizeof(ompi_cb_fifo_wrapper_t), and a call to ompi_cb_fifo_init(), which in turn has two allocations: one of a bunch of pointers and another of sizeof(ompi_cb_fifo_ctl_t). In short, for each connection, there are three allocations: *) sizeof(ompi_cb_fifo_wrapper_t)... about 64 bytes on LP64 *) a bunch of pointers... about 1 Kbyte on LP64 *) sizeof(ompi_cb_fifo_ctl_t)... about 12 bytes Let me say this yet another way. For N local processes, there are N*(N-1) per-connection allocations, most of which are 64 bytes or smaller. BUT, in ompi_fifo_init() and ompi_cb_fifo_init(), we ask for page alignment of each allocation. Further, in mca_mpool_sm_alloc() that alignment is further reinforced to be on page boundaries. As the number of local processes increases, therefore these per-connection allocations become very costly. For 8K pages, for example, and 100 on-node processes, we're talking 3*100*100*8K = 240 Mbytes. For 512 on-node processes (yes, we have nodes this big), that's 6 Gbyte... most of which is unused. (E.g., allocating more than an 8K page when we only need 64 or 12 bytes.) Okay, long intro. Let me start with a short question: do we really need page alignment for these allocations? Would cacheline alignment be okay? (I imagine I'll have follow-up questions once the answers start to roll in.) _______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel