Hello, This week I added support for CPU topology detection on 32 bit platform. Also I made some tests to releave the weakness of the usched_bsd4 in case of a hyperthreading cpu topology. For these tests a used "openssl speed" utility which measure the speed for generating various types of keys. One process is able to stress one CPU to 100% easily. In particular I used "openssl speed rsa512" and I took from the output the average time needed to sign (which is related with the amount of data generated by openssl speed rsa512 in 10 seconds - not relevant here, we are interested only in the metric "time to sign").
The following tests were made on a machine with 2 cores, each core having 2 threads: 1) one "openssl speed rsa512" process. In 10 runs, the time for sign was 0.000060s constant! 2) two "openssl speed rsa512" processes. In 10 runs, the time vary from 0.000060s to 0.000110s depending on which CPU they had run on. 3) four "openssl speed rsa512" processes. In 10 runs, the time for sign was 0.000110s for each of them. The problem is at test 2), presented above. The processes may be scheduled on the same core or not (depending on luck). As you can see at test 3), there is the same time for signing when two processes had run on the same core in test 2). The first goal is that these processes to make use of the full power of the platform and be scheduled on two different cores so the time for sign to be 0.000060s constant for each of them. Any feedback is welcome! Thanks, Mihai Carabas.