Hi, Is this correct?
> >> e.g.: CHIP0 has cores 0,1,2,8,9,10 and so does CHIP1, >> but there is no core labelled 3,4,5,6,7 and 11 >> also the cpu numbering seems 'staggered' to me - >> e.g. chip0/core0/{cpu1,cpu13) and chip1/core0/{cpu0,cpu12} >> rather than something more like: >> chip0/core0-5/cpu0-11 & chip1/core6-11/cpu12-23 >> > There is no standard on how the cpu's are numbered. The cpu1-cpu13, > cpu0-cpu12,etc grouping of HT threads is ok because I tested the HT passive > scheduling (the first part of my project) and it seems to give good > results. The core numbers are extracted from the APICIDs (the core_bits and > logical_bits are ok). We dumped the APICIDs with acpidump and they are ok. > I haven't look much at this issue. > And here is the proof (sorry for the ugly printings): xeon28# bash openssl_bench.sh 12 12 ###### STARTING openssl speed rsa512 with kern.usched_bsd4.ht_enable=0 ###### no_smt11 no_smt12 no_smt13 no_smt14 no_smt15 no_smt16 no_smt17 no_smt18 no_smt19 no_smt110 no_smt111 no_smt112 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 0.000098s 0.000160s 0.000160s 0.000160s 0.000160s 0.000160s 0.000099s 0.000161s 0.000160s 0.000098s 0.000160s 0.000098s 0.000095s 0.000095s 0.000155s 0.000155s 0.000095s 0.000095s 0.000095s 0.000156s 0.000095s 0.000095s 0.000095s 0.000156s 0.000095s 0.000095s 0.000095s 0.000095s 0.000095s 0.000095s 0.000155s 0.000095s 0.000156s 0.000155s 0.000095s 0.000156s 0.000155s 0.000155s 0.000095s 0.000155s 0.000095s 0.000095s 0.000095s 0.000095s 0.000155s 0.000155s 0.000096s 0.000155s 0.000155s 0.000095s 0.000155s 0.000097s 0.000155s 0.000155s 0.000155s 0.000155s 0.000095s 0.000095s 0.000095s 0.000095s 0.000155s 0.000095s 0.000155s 0.000155s 0.000155s 0.000155s 0.000095s 0.000096s 0.000156s 0.000095s 0.000095s 0.000096s 0.000155s 0.000095s 0.000155s 0.000096s 0.000095s 0.000095s 0.000095s 0.000154s 0.000095s 0.000095s 0.000095s 0.000156s 0.000155s 0.000095s 0.000155s 0.000155s 0.000095s 0.000095s 0.000095s 0.000096s 0.000095s 0.000095s 0.000095s 0.000155s 0.000155s 0.000095s 0.000155s 0.000155s 0.000095s 0.000095s 0.000155s 0.000095s 0.000096s 0.000095s 0.000156s 0.000156s 0.000155s 0.000095s 0.000155s 0.000095s 0.000095s 0.000155s 0.000096s 0.000095s 0.000155s 0.000095s 0.000095s 0.000096s 0.000156s 0.000095s 0.000095s 0.000154s 0.000154s 0.000152s 0.000095s 0.000095s 0.000154s 0.000156s 0.000095s 0.000095s 0.000155s 0.000095s 0.000155s 0.000095s 0.000095s 0.000096s 0.000096s 0.000095s 0.000156s 0.000155s 0.000095s 0.000095s ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ ###### STARTING openssl speed rsa512 with kern.usched_bsd4.ht_enable=1 ###### smt11 smt12 smt13 smt14 smt15 smt16 smt17 smt18 smt19 smt110 smt111 smt112 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 0.000099s 0.000096s 0.000096s 0.000096s 0.000099s 0.000099s 0.000096s 0.000099s 0.000096s 0.000096s 0.000099s 0.000099s 0.000096s 0.000096s 0.000096s 0.000096s 0.000096s 0.000095s 0.000095s 0.000095s 0.000096s 0.000095s 0.000095s 0.000095s 0.000096s 0.000096s 0.000096s 0.000096s 0.000096s 0.000095s 0.000095s 0.000095s 0.000096s 0.000095s 0.000095s 0.000095s 0.000096s 0.000096s 0.000096s 0.000095s 0.000096s 0.000095s 0.000095s 0.000095s 0.000095s 0.000096s 0.000095s 0.000096s 0.000095s 0.000096s 0.000095s 0.000095s 0.000095s 0.000095s 0.000096s 0.000096s 0.000096s 0.000096s 0.000096s 0.000095s 0.000095s 0.000096s 0.000095s 0.000095s 0.000096s 0.000096s 0.000095s 0.000096s 0.000096s 0.000096s 0.000095s 0.000095s 0.000096s 0.000095s 0.000095s 0.000095s 0.000096s 0.000096s 0.000095s 0.000096s 0.000096s 0.000096s 0.000095s 0.000095s 0.000096s 0.000095s 0.000096s 0.000095s 0.000096s 0.000096s 0.000095s 0.000095s 0.000096s 0.000096s 0.000095s 0.000095s 0.000096s 0.000097s 0.000095s 0.000095s 0.000096s 0.000096s 0.000095s 0.000096s 0.000095s 0.000096s 0.000095s 0.000095s 0.000095s 0.000096s 0.000095s 0.000095s 0.000096s 0.000097s 0.000097s 0.000096s 0.000095s 0.000096s 0.000095s 0.000095s 0.000095s 0.000096s 0.000095s 0.000095s 0.000096s 0.000096s 0.000096s 0.000096s 0.000095s 0.000096s 0.000095s 0.000096s 0.000097s 0.000097s 0.000095s 0.000095s 0.000097s 0.000095s 0.000095s 0.000095s 0.000159s 0.000158s 0.000097s 0.000097s ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ You see, that with HT enable, the results are more constant, than with no HT. So the CPU topology is detected ok. The cache coherence heuristics are not using the cpu topology. They are trying to schedule on the old CPU that the process run. We are not searching through topology to find the best fit (the process that has the closest old cpu). The main reason is we are in a locked region, and if we do advanced searching, that region would become very contented when in the systems are a lot of runable processes.I also did some tests, and results got worse. Mihai.