* Lai Jiangshan ([email protected]) wrote: > On 08/25/2011 04:35 PM, Paolo Bonzini wrote: > > On 08/25/2011 10:00 AM, Lai Jiangshan wrote: > >>> >I was measuring with 10 readers, not 3. It makes sense to wait more > >>> >with fewer readers. > >> > >> But my box just has 4 cores(i5 760). > > > > You cannot be always sure that readers are less than the cores. readers > > > cores is exactly the case when busy waiting hurts most. > > > > > It makes no sense to do a "readers > cores" *performance* rcutorture test.
I think it does make sense to benchmark this use-case actually. One major difference between Userspace RCU and kernel code is that Userspace RCU has to handle workloads that can sometimes be ill-fitted with respect to the system configuration, and still behave reasonably well. > When "readers > cores", the kernel scheduler will mess the test up. Even though I agree that the kernel scheduler will become heavily involved in these tests, I think that if we keep the same scheduler configuration between the tests and only modify the URCU algorithm, we can compare the impact of URCU well enough. So what I'm trying to say here is: I agree with you that we primarily need to optimize for performance of the "ideal" configuration (n threads for n cpus), but we also need to consider the cases where we have more threads than CPUs so, even though this behavior is not the one we mainly optimize for, we don't degrade its performance more than we should for the sake of very small gains in the ideal configuration. Thanks, Mathieu > > Thanks, > Lai. -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com _______________________________________________ ltt-dev mailing list [email protected] http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev
