On Wed, 2010-07-28 at 20:36 +0200, Brice Goglin wrote:
> Le 28/07/2010 18:53, Brice Goglin a écrit :
> > Distance matrix between Group0 objects:
> > 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66
> > 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64
> > 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62
> > 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60
> > 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58
> > 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56
> > 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54
> > 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52
> > 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50
> > 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48
> > 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46
> > 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44
> > 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42
> > 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40
> > 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38
> > 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36
> > 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34
> > 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32
> > 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30
> > 58 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28
> > 60 58 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26
> > 62 60 58 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24
> > 64 62 60 58 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22
> > 66 64 62 60 58 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13
> >
> > Between Group1:
> > 17 24 28 32 36 40 44 48 52 56 60 64
> > 24 17 24 28 32 36 40 44 48 52 56 60
> > 28 24 17 24 28 32 36 40 44 48 52 56
> > 32 28 24 17 24 28 32 36 40 44 48 52
> > 36 32 28 24 17 24 28 32 36 40 44 48
> > 40 36 32 28 24 17 24 28 32 36 40 44
> > 44 40 36 32 28 24 17 24 28 32 36 40
> > 48 44 40 36 32 28 24 17 24 28 32 36
> > 52 48 44 40 36 32 28 24 17 24 28 32
> > 56 52 48 44 40 36 32 28 24 17 24 28
> > 60 56 52 48 44 40 36 32 28 24 17 24
> > 64 60 56 52 48 44 40 36 32 28 24 17
> >
> > Group2:
> > 20 28 36 44 52 60
> > 28 20 28 36 44 52
> > 36 28 20 28 36 44
> > 44 36 28 20 28 36
> > 52 44 36 28 20 28
> > 60 52 44 36 28 20
> >
> > Group3:
> > 24 36 52
> > 36 24 36
> > 52 36 24
> >   
> 
> Actually, all these distance matrices (except the NUMA nodes' one, the
> one not included above) show a ring topology without the link between
> the first and the last object. So grouping makes no sense there. hwloc
> 1.0.x groups object #2N with object #2N+1 because its grouping algorithm
> isn't very clever. It could also link #2N-1 with #2N, it wouldn't be
> worse. The grouping algorithm is more clever in svn trunk. It detects
> this ring properly and does not group anything (except pairs of NUMA node).
> 
> It's actually surprising that this machine doesn't show a better
> distance matrix. I guess SGI still has a hypercube or whatever nice
> topology interconnected IRUs and blades. Older Altix machines had very
> nice distance matrices were we would detect multiple levels of groups
> that really showed the physical hierarchy of blades/IRUs/... I wonder if
> your SGI BIOS is buggy :)

It would not be the first case of a buggy BIOS. I'll forward our
discussion to our SGI representatives and Alexis Cousein and Rüdiger
Wolff from SGI (M. Raymond may know him). Let's see what they say. We
are one of the early UltraViolet customers.

>From my point of view, having the groupings beyond the blade level in
the hwloc topology is good for our purposes. We want to use the hwloc
topology to calculate pinning maps for MPI applications. Currently we
use the distance map got via hwloc to scatter tasks according to a
maximum-distance approach between HWLOC_OBJ_PU objects. I also gave our
current algorithm to the MVAPICH2 dev team, which wants to put it into
the next 1.5.x release.
With the example UV topology we discuss here, our pinning map starts
with PU objects os_index 0,256,128,320,... that means 1st task on 1st
CPU of 1st Group3, 2nd task on 1st CPU of 3rd Group3 (which is the
lonely one), 3rd task on 1st CPU of 2nd Group3. Having in mind that an
MPI application that got all CPUs of this topology may start only 3
tasks and each task allocates a lot of memory far beyond than what a
single NUMA node has directly attached, then reducing the topology to
the NUMA-node or blade level would be a bad idea, because then our
pinning map would start with 0,16,32,48,... (when having only the Group0
level but not the higher groupings).

Comments appreciated !!!

> Michael Raymond, anything to say about this?
> 
> Brice
> 

-- 
Dr. Bernd Kallies
Konrad-Zuse-Zentrum für Informationstechnik Berlin
Takustr. 7
14195 Berlin
Tel: +49-30-84185-270
Fax: +49-30-84185-311
e-mail: kall...@zib.de

Reply via email to