Le 08/09/2016 19:17, Brice Goglin a écrit :
>> By the way, is it expected that binding will be slow on it?  hwloc-bind
>> is ~10 times slower (~1s) than on two-socket sandybridge, and ~3 times
>> slower than on a 128-core, 16-socket system.
> Binding itself shouldn't be slower. But hwloc's topology discovery
> (which is performed by hwloc-bind before actual binding) is slower on
> KNL than on "normal" nodes. The overhead is basically linear with the
> number of hyperthreads, and KNL sequential perf is lower than your other
> nodes.
> The easy fix is to export the topology to XML with lstopo foo.xml and
> then tell all hwloc users to load from XML:
> export HWLOC_XMLFILE=foo.xml
> https://www.open-mpi.org/projects/hwloc/doc/v1.11.4/a00030.php#faq_xml
> For hwloc 2.0, I am trying to make sure we don't perform useless
> discovery steps. hwloc-bind (and many applications) don't require all
> topology details. v1.x gathers everything and filters things out later.
> For 2.0, the plan is rather to directly just gather what we need. What
> you can try for fun is:
> export HWLOC_COMPONENTS=-x86 (without the above XML env vars)
> It disables the x86-specific discovery which is useless for most cases
> on Linux.

Interesting, this last idea doesn't help. XML is much faster (0.14s),
but normal discovery is still 1s without the x86-specific code.

So what's really slow is reading sysfs and/or inserting all hwloc
objects in the tree. I need to do some profiling. And I am moving the
item "parallelize the discovery" higher in the TODO list :)


hwloc-users mailing list

Reply via email to