On Wed, 2010-07-28 at 18:53 +0200, Brice Goglin wrote: > Le 28/07/2010 18:09, Bernd Kallies a écrit : > > Is attached. I also checked for cpusets. I ran lstopo and > > gather_topology from the root cpuset, which is the only cpuset and > > contains cpus 0-767 and mems 0-47, that is - the whole machine. > > > > Background info: The UltraViolet architecture is new. There exists a > > white paper about this at http://www.sgi.com/pdfs/4192.pdf > > We have one UV rack, which is filled with 3/4 of the max. number of > > blades. According to the specs, two NUMA nodes form one "blade". This > > level corresponds to "Group0" in the hwloc topology. Two blades are > > cross-linked via the NUMAlink, forming "paired nodes" = "Group1". What > > "Group2" might correspond to - I don't know. > > We group by distance, so it's look like there's something tagging these > nodes as closer, and hwloc makes them a new group level > > > "Group3" corresponds to one > > "chassis" or IRU. "Group4" may be an "enclosure", and "Machine" is the > > "rack". > > > > From my opinion the hwloc topology for our machine should contain 2x > > Group4. The 1st should contain 2x Group3, the 2nd one 1x Group3. lstopo > > shows 1x Group4 containing 3x Group3, instead. > > > > Actually no, but it's very hard to see :) > lstopo - | egrep "(NUMA|Group)" > shows that Group4#0 only contains Group3#0 and #1. > Group3#2 is directly a child of the machine (the indentation is smaller).
Ah, I see. > Open a *big* terminal window and look at the distance matrix: > $ cat /sys/devices/system/node/node{?,??}/distance > (I am not copy/pasting it here, it's too big :)) > > hwloc groups objects that have smaller distances and then compute > distances between groups (average between distances of objects in each > group). We get: > > Distance matrix between Group0 objects: > 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 > 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 > 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 > 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 > 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 > 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 > 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 > 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 > 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 > 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 > 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 > 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 > 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 > 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 > 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 > 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 > 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 > 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 > 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 > 58 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 > 60 58 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 > 62 60 58 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 > 64 62 60 58 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 > 66 64 62 60 58 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 > > Between Group1: > 17 24 28 32 36 40 44 48 52 56 60 64 > 24 17 24 28 32 36 40 44 48 52 56 60 > 28 24 17 24 28 32 36 40 44 48 52 56 > 32 28 24 17 24 28 32 36 40 44 48 52 > 36 32 28 24 17 24 28 32 36 40 44 48 > 40 36 32 28 24 17 24 28 32 36 40 44 > 44 40 36 32 28 24 17 24 28 32 36 40 > 48 44 40 36 32 28 24 17 24 28 32 36 > 52 48 44 40 36 32 28 24 17 24 28 32 > 56 52 48 44 40 36 32 28 24 17 24 28 > 60 56 52 48 44 40 36 32 28 24 17 24 > 64 60 56 52 48 44 40 36 32 28 24 17 > > Group2: > 20 28 36 44 52 60 > 28 20 28 36 44 52 > 36 28 20 28 36 44 > 44 36 28 20 28 36 > 52 44 36 28 20 28 > 60 52 44 36 28 20 > > Group3: > 24 36 52 > 36 24 36 > 52 36 24 > > The way I am reading this is: > IRU#1 is close to IRU#0 and #2, but #0 and #2 are far away for each other. > Then I don't think we can group 2 IRU and keep a third one on the side > as you said. > How would you group these? > > That said, something is going wrong with the grouping code. Right now, > it should keep 3 Group3 under the machine. I am looking at it. So it seems to me that you basically get a distance matrix of PU objects from the system (the machine vendor), and probably you do agglomerative average linkage cluster analysis on it to determine the number and hierarchy of HWLOC_OBJ_GROUP objects (beyond what can be named by some hardware building block like core or cache etc). Is this right? I'm wondering if this is the right approach. Did you try other distance functions (e.g. single linkage)? Besides that, and from the viewpoint of a tree representation of the result of clustering, I would expect that every pair of two objects of same type have common anchestors of the same type. For the given UV topology I would not expect that there are two Group3 that have a Group4 ancestor, while the 3rd Group3 is direct child of Machine. I would expect EITHER that the 3rd Group3 is also child of a Group4 (maybe a second one), OR that there is no Group4. Sincerely BK > Brice > -- Dr. Bernd Kallies Konrad-Zuse-Zentrum für Informationstechnik Berlin Takustr. 7 14195 Berlin Tel: +49-30-84185-270 Fax: +49-30-84185-311 e-mail: kall...@zib.de