On Wed, 2010-07-28 at 20:36 +0200, Brice Goglin wrote: > Le 28/07/2010 18:53, Brice Goglin a écrit : > > Distance matrix between Group0 objects: > > 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 > > 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 > > 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 > > 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 > > 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 > > 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 > > 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 > > 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 > > 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 > > 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 > > 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 > > 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 > > 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 > > 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 > > 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 > > 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 > > 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 > > 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 > > 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 > > 58 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 > > 60 58 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 > > 62 60 58 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 > > 64 62 60 58 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 > > 66 64 62 60 58 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 > > > > Between Group1: > > 17 24 28 32 36 40 44 48 52 56 60 64 > > 24 17 24 28 32 36 40 44 48 52 56 60 > > 28 24 17 24 28 32 36 40 44 48 52 56 > > 32 28 24 17 24 28 32 36 40 44 48 52 > > 36 32 28 24 17 24 28 32 36 40 44 48 > > 40 36 32 28 24 17 24 28 32 36 40 44 > > 44 40 36 32 28 24 17 24 28 32 36 40 > > 48 44 40 36 32 28 24 17 24 28 32 36 > > 52 48 44 40 36 32 28 24 17 24 28 32 > > 56 52 48 44 40 36 32 28 24 17 24 28 > > 60 56 52 48 44 40 36 32 28 24 17 24 > > 64 60 56 52 48 44 40 36 32 28 24 17 > > > > Group2: > > 20 28 36 44 52 60 > > 28 20 28 36 44 52 > > 36 28 20 28 36 44 > > 44 36 28 20 28 36 > > 52 44 36 28 20 28 > > 60 52 44 36 28 20 > > > > Group3: > > 24 36 52 > > 36 24 36 > > 52 36 24 > > > > Actually, all these distance matrices (except the NUMA nodes' one, the > one not included above) show a ring topology without the link between > the first and the last object. So grouping makes no sense there. hwloc > 1.0.x groups object #2N with object #2N+1 because its grouping algorithm > isn't very clever. It could also link #2N-1 with #2N, it wouldn't be > worse. The grouping algorithm is more clever in svn trunk. It detects > this ring properly and does not group anything (except pairs of NUMA node). > > It's actually surprising that this machine doesn't show a better > distance matrix. I guess SGI still has a hypercube or whatever nice > topology interconnected IRUs and blades. Older Altix machines had very > nice distance matrices were we would detect multiple levels of groups > that really showed the physical hierarchy of blades/IRUs/... I wonder if > your SGI BIOS is buggy :)
It would not be the first case of a buggy BIOS. I'll forward our discussion to our SGI representatives and Alexis Cousein and Rüdiger Wolff from SGI (M. Raymond may know him). Let's see what they say. We are one of the early UltraViolet customers. >From my point of view, having the groupings beyond the blade level in the hwloc topology is good for our purposes. We want to use the hwloc topology to calculate pinning maps for MPI applications. Currently we use the distance map got via hwloc to scatter tasks according to a maximum-distance approach between HWLOC_OBJ_PU objects. I also gave our current algorithm to the MVAPICH2 dev team, which wants to put it into the next 1.5.x release. With the example UV topology we discuss here, our pinning map starts with PU objects os_index 0,256,128,320,... that means 1st task on 1st CPU of 1st Group3, 2nd task on 1st CPU of 3rd Group3 (which is the lonely one), 3rd task on 1st CPU of 2nd Group3. Having in mind that an MPI application that got all CPUs of this topology may start only 3 tasks and each task allocates a lot of memory far beyond than what a single NUMA node has directly attached, then reducing the topology to the NUMA-node or blade level would be a bad idea, because then our pinning map would start with 0,16,32,48,... (when having only the Group0 level but not the higher groupings). Comments appreciated !!! > Michael Raymond, anything to say about this? > > Brice > -- Dr. Bernd Kallies Konrad-Zuse-Zentrum für Informationstechnik Berlin Takustr. 7 14195 Berlin Tel: +49-30-84185-270 Fax: +49-30-84185-311 e-mail: kall...@zib.de