Sorry, Jeff, missed your msg about sending it to the dev list. Background: I wanted to be able to easily generate communicators based on locality of PU used in MPI. My initial idea is to use MPI_Win_Create to create shared memory based on locality. In my line of ideas I have a few arrays which are rarely needed, and when they are I need all information from all processors. Instead of performing full AllGather I could use a shared memory base and skip the overhead of communication and only have overhead of memory locality. Ok, this might be too specific, but I wanted to test it to learn something about shared memory in MPI ;)
This functionality is already existing in the hwloc base, it contains all the information that is needed. So I worked on the idea and got MPI to recognize a few more flags based on the locality provided by hwloc. The function MPI_Comm_Split_Type already provides this type of splitting: MPI_COMM_TYPE_SHARED which pretty much does what I wanted. But it fell short of the general scheme to all levels of control. So I added different communicator splittings based on these locality segments: OMPI_COMM_TYPE_CU OMPI_COMM_TYPE_HOST OMPI_COMM_TYPE_BOARD OMPI_COMM_TYPE_NODE // same as MPI_COMM_TYPE_SHARED MPI_COMM_TYPE_SHARED // same as OMPI_COMM_TYPE_NODE OMPI_COMM_TYPE_NUMA OMPI_COMM_TYPE_SOCKET OMPI_COMM_TYPE_L3CACHE OMPI_COMM_TYPE_L2CACHE OMPI_COMM_TYPE_L1CACHE OMPI_COMM_TYPE_CORE OMPI_COMM_TYPE_HWTHREAD My branch can be found at: https://github.com/zerothi/ompi First a small "bug" report on the compilation: I had problems right after the autogen.pl script. Procedure: $> git clone .. ompi $> cd ompi $> ./autogen.pl My build versions: m4: 1.4.17 automake: 1.14 autoconf: 2.69 libtool: 2.4.3 the autogen completes successfully (attached is the autogen output if needed) $> mkdir build $> cd build $> ../configure --with-platform=optimized I have attached the config.log (note that I have tested it with both the shipped 1.9.1 and 1.10.0 hwloc) $> make all Error message is: make[2]: Entering directory '/home/nicpa/test/build/opal/libltdl' CDPATH="${ZSH_VERSION+.}:" && cd ../../../opal/libltdl && /bin/bash /home/nicpa/test/config/missing aclocal-1.14 -I ../../config aclocal-1.14: error: ../../config/autogen_found_items.m4:308: file 'opal/mca/backtrace/configure.m4' does not exist this error message is the same as found: http://www.open-mpi.org/community/lists/devel/2013/07/12504.php My work-around is simple It has to do with the created ACLOCAL_AMFLAGS variable in build/opal/libltdl/Makefile OLD: ACLOCAL_AMFLAGS = -I ../../config CORRECT: ACLOCAL_AMFLAGS = -I ../../ Either the configure script creates the wrong include paths for the m4 scripts, or the m4 scripts are not copied fully to the config directory. Ok, it works and the fix is simple. I just wonder why? First here is my test system 1: $> hwloc-info depth 0: 1 Machine (type #1) depth 1: 1 Socket (type #3) depth 2: 1 L3Cache (type #4) depth 3: 2 L2Cache (type #4) depth 4: 2 L1dCache (type #4) depth 5: 2 L1iCache (type #4) depth 6: 2 Core (type #5) depth 7: 4 PU (type #6) Special depth -3: 2 Bridge (type #9) Special depth -4: 4 PCI Device (type #10) Special depth -5: 5 OS Device (type #11) and my test system 2: depth 0: 1 Machine (type #1) depth 1: 1 Socket (type #3) depth 2: 1 L3Cache (type #4) depth 3: 4 L2Cache (type #4) depth 4: 4 L1dCache (type #4) depth 5: 4 L1iCache (type #4) depth 6: 4 Core (type #5) depth 7: 8 PU (type #6) Special depth -3: 3 Bridge (type #9) Special depth -4: 3 PCI Device (type #10) Special depth -5: 4 OS Device (type #11) Here is an excerpt of what it can do (I have attached a fortran program that creates a communicator using all types): $> mpirun -np 4 ./comm_split Example of MPI_Comm_Split_Type Currently using 4 nodes. Comm using CU Node: 2 local rank: 2 out of 4 ranks Comm using CU Node: 3 local rank: 3 out of 4 ranks Comm using CU Node: 1 local rank: 1 out of 4 ranks Comm using CU Node: 0 local rank: 0 out of 4 ranks Comm using Host Node: 0 local rank: 0 out of 4 ranks Comm using Host Node: 2 local rank: 2 out of 4 ranks Comm using Host Node: 3 local rank: 3 out of 4 ranks Comm using Host Node: 1 local rank: 1 out of 4 ranks Comm using Board Node: 2 local rank: 2 out of 4 ranks Comm using Board Node: 3 local rank: 3 out of 4 ranks Comm using Board Node: 1 local rank: 1 out of 4 ranks Comm using Board Node: 0 local rank: 0 out of 4 ranks Comm using Node Node: 0 local rank: 0 out of 4 ranks Comm using Node Node: 1 local rank: 1 out of 4 ranks Comm using Node Node: 2 local rank: 2 out of 4 ranks Comm using Node Node: 3 local rank: 3 out of 4 ranks Comm using Shared Node: 0 local rank: 0 out of 4 ranks Comm using Shared Node: 3 local rank: 3 out of 4 ranks Comm using Shared Node: 1 local rank: 1 out of 4 ranks Comm using Shared Node: 2 local rank: 2 out of 4 ranks Comm using Numa Node: 0 local rank: 0 out of 1 ranks Comm using Numa Node: 2 local rank: 0 out of 1 ranks Comm using Numa Node: 3 local rank: 0 out of 1 ranks Comm using Numa Node: 1 local rank: 0 out of 1 ranks Comm using Socket Node: 1 local rank: 0 out of 1 ranks Comm using Socket Node: 2 local rank: 0 out of 1 ranks Comm using Socket Node: 3 local rank: 0 out of 1 ranks Comm using Socket Node: 0 local rank: 0 out of 1 ranks Comm using L3 Node: 0 local rank: 0 out of 1 ranks Comm using L3 Node: 3 local rank: 0 out of 1 ranks Comm using L3 Node: 1 local rank: 0 out of 1 ranks Comm using L3 Node: 2 local rank: 0 out of 1 ranks Comm using L2 Node: 2 local rank: 0 out of 1 ranks Comm using L2 Node: 3 local rank: 0 out of 1 ranks Comm using L2 Node: 1 local rank: 0 out of 1 ranks Comm using L2 Node: 0 local rank: 0 out of 1 ranks Comm using L1 Node: 0 local rank: 0 out of 1 ranks Comm using L1 Node: 1 local rank: 0 out of 1 ranks Comm using L1 Node: 2 local rank: 0 out of 1 ranks Comm using L1 Node: 3 local rank: 0 out of 1 ranks Comm using Core Node: 0 local rank: 0 out of 1 ranks Comm using Core Node: 3 local rank: 0 out of 1 ranks Comm using Core Node: 1 local rank: 0 out of 1 ranks Comm using Core Node: 2 local rank: 0 out of 1 ranks Comm using HW Node: 2 local rank: 0 out of 1 ranks Comm using HW Node: 3 local rank: 0 out of 1 ranks Comm using HW Node: 1 local rank: 0 out of 1 ranks Comm using HW Node: 0 local rank: 0 out of 1 ranks This is the output on both systems (note that I in the first system oversubscribe the node). I have not tested it on a cluster :(. One thing that worries me is that the SOCKET and L3 cache split types are not of size 4? I only have one socket, and one L3 cache, so they must be sharing? I am not so sure about NUMA in this case. If you need any more information about my setup to debug this, please let me know. Or am I completely missing something? I tried looking into the opal/mca/hwloc/hwloc.h, but I have no idea whether they are related to the problem or not. If you think, I can make a pull request at its current stage? -- Kind regards Nick
autogen.out.bz2
Description: BZip2 compressed data
config.log.bz2
Description: BZip2 compressed data
comm_split.f90
Description: Binary data