Sorry, Jeff, missed your msg about sending it to the dev list.

Background:
I wanted to be able to easily generate communicators based on locality of
PU used in MPI.
My initial idea is to use MPI_Win_Create to create shared memory based on
locality.
In my line of ideas I have a few arrays which are rarely needed, and when
they are I need all information from all processors.
Instead of performing full AllGather I could use a shared memory base and
skip the overhead of communication and only have overhead of memory
locality. Ok, this might be too specific, but I wanted to test it to learn
something about shared memory in MPI ;)

This functionality is already existing in the hwloc base, it contains all
the information that is needed.

So I worked on the idea and got MPI to recognize a few more flags based on
the locality provided by hwloc.
The function MPI_Comm_Split_Type already provides this type of splitting:
MPI_COMM_TYPE_SHARED
which pretty much does what I wanted.
But it fell short of the general scheme to all levels of control.

So I added different communicator splittings based on these locality
segments:
OMPI_COMM_TYPE_CU
OMPI_COMM_TYPE_HOST
OMPI_COMM_TYPE_BOARD
OMPI_COMM_TYPE_NODE // same as MPI_COMM_TYPE_SHARED
MPI_COMM_TYPE_SHARED // same as OMPI_COMM_TYPE_NODE
OMPI_COMM_TYPE_NUMA
OMPI_COMM_TYPE_SOCKET
OMPI_COMM_TYPE_L3CACHE
OMPI_COMM_TYPE_L2CACHE
OMPI_COMM_TYPE_L1CACHE
OMPI_COMM_TYPE_CORE
OMPI_COMM_TYPE_HWTHREAD

My branch can be found at: https://github.com/zerothi/ompi

First a small "bug" report on the compilation:
I had problems right after the autogen.pl script.
Procedure:
$> git clone .. ompi
$> cd ompi
$> ./autogen.pl
My build versions:
m4: 1.4.17
automake: 1.14
autoconf: 2.69
libtool: 2.4.3
the autogen completes successfully (attached is the autogen output if
needed)
$> mkdir build
$> cd build
$> ../configure --with-platform=optimized
I have attached the config.log (note that I have tested it with both the
shipped 1.9.1 and 1.10.0 hwloc)
$> make all
Error message is:
make[2]: Entering directory '/home/nicpa/test/build/opal/libltdl'
CDPATH="${ZSH_VERSION+.}:" && cd ../../../opal/libltdl && /bin/bash
/home/nicpa/test/config/missing aclocal-1.14 -I ../../config
aclocal-1.14: error: ../../config/autogen_found_items.m4:308: file
'opal/mca/backtrace/configure.m4' does not exist
this error message is the same as found:
http://www.open-mpi.org/community/lists/devel/2013/07/12504.php
My work-around is simple
It has to do with the created ACLOCAL_AMFLAGS variable
in build/opal/libltdl/Makefile
OLD:
ACLOCAL_AMFLAGS = -I ../../config
CORRECT:
ACLOCAL_AMFLAGS = -I ../../
Either the configure script creates the wrong include paths for the m4
scripts, or the m4 scripts are not copied fully to the config directory.
Ok, it works and the fix is simple. I just wonder why?


First here is my test system 1:
$> hwloc-info
depth 0: 1 Machine (type #1)
depth 1: 1 Socket (type #3)
depth 2: 1 L3Cache (type #4)
depth 3: 2 L2Cache (type #4)
depth 4: 2 L1dCache (type #4)
depth 5: 2 L1iCache (type #4)
depth 6: 2 Core (type #5)
depth 7: 4 PU (type #6)
Special depth -3: 2 Bridge (type #9)
Special depth -4: 4 PCI Device (type #10)
Special depth -5: 5 OS Device (type #11)

and my test system 2:
depth 0: 1 Machine (type #1)
depth 1: 1 Socket (type #3)
depth 2: 1 L3Cache (type #4)
depth 3: 4 L2Cache (type #4)
depth 4: 4 L1dCache (type #4)
depth 5: 4 L1iCache (type #4)
depth 6: 4 Core (type #5)
depth 7: 8 PU (type #6)
Special depth -3: 3 Bridge (type #9)
Special depth -4: 3 PCI Device (type #10)
Special depth -5: 4 OS Device (type #11)

Here is an excerpt of what it can do (I have attached a fortran program
that creates a communicator using all types):

$> mpirun -np 4 ./comm_split
Example of MPI_Comm_Split_Type

Currently using 4 nodes.

Comm using CU Node: 2 local rank: 2 out of 4 ranks
Comm using CU Node: 3 local rank: 3 out of 4 ranks
Comm using CU Node: 1 local rank: 1 out of 4 ranks
Comm using CU Node: 0 local rank: 0 out of 4 ranks

Comm using Host Node: 0 local rank: 0 out of 4 ranks
Comm using Host Node: 2 local rank: 2 out of 4 ranks
Comm using Host Node: 3 local rank: 3 out of 4 ranks
Comm using Host Node: 1 local rank: 1 out of 4 ranks

Comm using Board Node: 2 local rank: 2 out of 4 ranks
Comm using Board Node: 3 local rank: 3 out of 4 ranks
Comm using Board Node: 1 local rank: 1 out of 4 ranks
Comm using Board Node: 0 local rank: 0 out of 4 ranks

Comm using Node Node: 0 local rank: 0 out of 4 ranks
Comm using Node Node: 1 local rank: 1 out of 4 ranks
Comm using Node Node: 2 local rank: 2 out of 4 ranks
Comm using Node Node: 3 local rank: 3 out of 4 ranks

Comm using Shared Node: 0 local rank: 0 out of 4 ranks
Comm using Shared Node: 3 local rank: 3 out of 4 ranks
Comm using Shared Node: 1 local rank: 1 out of 4 ranks
Comm using Shared Node: 2 local rank: 2 out of 4 ranks

Comm using Numa Node: 0 local rank: 0 out of 1 ranks
Comm using Numa Node: 2 local rank: 0 out of 1 ranks
Comm using Numa Node: 3 local rank: 0 out of 1 ranks
Comm using Numa Node: 1 local rank: 0 out of 1 ranks

Comm using Socket Node: 1 local rank: 0 out of 1 ranks
Comm using Socket Node: 2 local rank: 0 out of 1 ranks
Comm using Socket Node: 3 local rank: 0 out of 1 ranks
Comm using Socket Node: 0 local rank: 0 out of 1 ranks

Comm using L3 Node: 0 local rank: 0 out of 1 ranks
Comm using L3 Node: 3 local rank: 0 out of 1 ranks
Comm using L3 Node: 1 local rank: 0 out of 1 ranks
Comm using L3 Node: 2 local rank: 0 out of 1 ranks

Comm using L2 Node: 2 local rank: 0 out of 1 ranks
Comm using L2 Node: 3 local rank: 0 out of 1 ranks
Comm using L2 Node: 1 local rank: 0 out of 1 ranks
Comm using L2 Node: 0 local rank: 0 out of 1 ranks

Comm using L1 Node: 0 local rank: 0 out of 1 ranks
Comm using L1 Node: 1 local rank: 0 out of 1 ranks
Comm using L1 Node: 2 local rank: 0 out of 1 ranks
Comm using L1 Node: 3 local rank: 0 out of 1 ranks

Comm using Core Node: 0 local rank: 0 out of 1 ranks
Comm using Core Node: 3 local rank: 0 out of 1 ranks
Comm using Core Node: 1 local rank: 0 out of 1 ranks
Comm using Core Node: 2 local rank: 0 out of 1 ranks

Comm using HW Node: 2 local rank: 0 out of 1 ranks
Comm using HW Node: 3 local rank: 0 out of 1 ranks
Comm using HW Node: 1 local rank: 0 out of 1 ranks
Comm using HW Node: 0 local rank: 0 out of 1 ranks

This is the output on both systems (note that I in the first system
oversubscribe the node). I have not tested it on a cluster :(.
One thing that worries me is that the SOCKET and L3 cache split types are
not of size 4? I only have one socket, and one L3 cache, so they must be
sharing?
I am not so sure about NUMA in this case. If you need any more information
about my setup to debug this, please let me know.
Or am I completely missing something?

I tried looking into the opal/mca/hwloc/hwloc.h, but I have no idea whether
they are related to the problem or not.

If you think, I can make a pull request at its current stage?

--
Kind regards Nick

Attachment: autogen.out.bz2
Description: BZip2 compressed data

Attachment: config.log.bz2
Description: BZip2 compressed data

Attachment: comm_split.f90
Description: Binary data

Reply via email to