Doh! I forgot to add hwloc-devel before hitting send. Brice / Samuel - see below.
Sent from my phone. No type good. On Nov 2, 2011, at 8:40 AM, "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> wrote: > Chris - > > I totally missed this email; sorry! I'm cc-ing hwloc-devel to see if > brice/Samuel can fix. > > I'm assuming we'll need this fix in the 1.2 hwloc branch as well. (I'm also > assuming that the trunk referred to here is the OMPI trunk, now the hwloc > trunk). > > Sent from my phone. No type good. > > On Oct 26, 2011, at 6:15 AM, "Christopher Yeoh" <cy...@au1.ibm.com> wrote: > > > Hi Jeff, > > > > Brad mentioned you might be able to help me with an OMPI hwloc issue > > I'm having. > > > > Its occurring on a Power 5 RHEL 6.0 machine and related to the xml > > representation of the topology. I've attached the xml to this email. > > The problem only occurs on the trunk code. > > > > The part which appears to be the problem is this: > > > > <distances nbobjs="4" relative_depth="0" latency_base="10.000000"> > > <latency value="1.000000"/> > > <latency value="1.000000"/> > > <latency value="1.000000"/> > > <latency value="1.000000"/> > > <latency value="1.000000"/> > > <latency value="1.000000"/> > > <latency value="1.000000"/> > > <latency value="1.000000"/> > > <latency value="1.000000"/> > > <latency value="1.000000"/> > > <latency value="1.000000"/> > > <latency value="1.000000"/> > > <latency value="1.000000"/> > > <latency value="1.000000"/> > > <latency value="1.000000"/> > > <latency value="1.000000"/> > > </distances> > > > > specifically with relative_depth having a value of 0, but still having > > latency children information. In hwloc__xml_import_distances in > > topology-xml.c there's a check that assumes there is no latency > > information. > > > > Around line 634 in topology-xml.c: > > > > if (nbobjs && reldepth && latbase) { > > ... process latency xml nodes > > } > > > > return hwloc__xml_import_close_tag(state); > > > > The hwloc__xml_import_close_tag function returns a failure because the > > latency nodes have not been processed yet. > > > > I had a look in orted where the xml is created and it does look like > > the xml is being assembled correctly as per the topology information it > > has retrieved (though I don't know if that itself is correct). The > > hwloc__xml_export_object function will quite happily create distance > > information if the relative depth is 0 even though > > hwloc__xml_import_distance will not be able to parse it. > > > > So there is at least a problem that the topology code will create xml > > that it can't parse, but I don't know enough about the hwloc library to > > know if relative depth should always be positive. I suspect its the > > former which is the problem not the latter, but I don't know for sure... > > > > If it helps, this is the output of lstopo on the machine: > > > > cyeoh@p5-40-P4-E0:~$ /home/OpenHPC/hwloc/build/bin/lstopo > > Machine (2048MB) > > NUMANode L#0 (P#0 512MB) > > Socket L#0 + L1 L#0 (32KB) + Core L#0 > > PU L#0 (P#0) > > PU L#1 (P#1) > > Socket L#1 + L1 L#1 (32KB) + Core L#1 > > PU L#2 (P#2) > > PU L#3 (P#3) > > NUMANode L#1 (P#1 640MB) > > NUMANode L#2 (P#2 512MB) > > NUMANode L#3 (P#3 384MB) > > > > Regards, > > > > Chris > > -- > > cy...@au.ibm.com > > <fandango_hwloc_xml.txt>