Hi all, I'm using hwloc-distrib quite often to distribute jobs optimally on NUMA boxes. I use it to test linux kernel task - scheduler by comparing runtime of jobs bound to best possible CPU configuration (keeping CPU cache in mind) with runs without CPU affinity set.
I just run into strange issue on box with newest Intel's Nehalem CPUs. There are 4 Sockets, each with 8 physical cores and hyper-threading enabled, which gives you 64 OS processors. The box has strange NUMA layout - I will need to check why it is so. Basically, there are 3 NUMA nodes - one includes 2 Sockets, other 2 have one Socket associated to each of it. hwloc-distrib --single 8 will distribute jobs in the following way: 3 jobs on NUMANode #0 3 jobs on NUMANode #1 2 jobs on NUMANode #2 lstopo 64.pdf for A in $(hwloc-distrib --single 8); do taskset ${A} sleep 100 & done lstopo --top top.pdf hwloc-distrib does it in fact right but this is not what I want. It's not the best configuration when you consider CPU cache! I have figured-out following way how to tell hwloc-distrib to avoid using NUMANodes when computing CPU distribution: lstopo --ignore NUMANode No_NUMA.xml for A in $(hwloc-distrib --xml No_NUMA.xml --single 8); do taskset ${A} sleep 100 & done lstopo --top fix.pdf I'm wondering if there is a better way how to make "Socket" the top object. Something like: hwloc-distrib --ignore NUMANode --single 8 or hwloc-distrib --top_level Socket --single 8 would be very useful. Is there something like this already? If not would you consider this as an enhancement? Thanks! Jirka
fix.pdf
Description: Adobe PDF document
top.pdf
Description: Adobe PDF document
64.pdf
Description: Adobe PDF document