Le 12/09/2016 04:20, Brice Goglin a écrit :
> So what's really slow is reading sysfs and/or inserting all hwloc
> objects in the tree. I need to do some profiling. And I am moving the
> item "parallelize the discovery" higher in the TODO list :) Brice 


I ran more benchmarks. What's really slow is the reading of all sysfs
files. About 90% of the topology building time is spent there on KNL.
We're reading more than 7000 files (most of them are 6 files for each
hardware thread and 6 files for each cache).
Reading from sysfs is significantly slower than reading normal files
that are cached (not surprising since the kernel doesn't cache sysfs
file contents).
And reading on KNL is about 30 times slower than on my laptop (70us for
each sysfs file). Don't know why.

And if you have one process doing this on each core simultaneously,
things become up to 30x slower.

Looks like XML is really the way to go on these platforms. One thing
that XML currently misses is cgroup support. You need to export the XML
inside the same cgroup or the topology will be wrong. I am adding an
option to read the current cgroup restrictions from the OS and apply it
to a XML imported topology (must be created outside of all cgroups).


hwloc-users mailing list

Reply via email to