kern sched_ule.c

Jeff Roberson Mon, 01 Oct 2007 23:55:49 -0700

On Mon, 1 Oct 2007, Bruce Evans wrote:

On Sun, 30 Sep 2007, Jeff Roberson wrote:

On Sat, 29 Sep 2007, Kevin Oberman wrote:

YMMV, but ULE seems to generally work better then 4BSD for interactive
uniprocessor systems. The preferred scheduler for uniprocessor servers
is less clear, but many test have shown ULE does better for those
systems in the majority of cases.


I feel it's safe to say desktop behavior on UP is definitely superior.


This is unsafe to say.

I think there is no significant difference on UP between 4BSD and ULE


This may be safe to say, but is inconsistent with the above.

except perhaps in context switching microbenchmarks where ULE falls behind.


It is safe to say that interactive users cannot notice insignificant
differences.  It takes a micro-benchmark to notice possibly-significant
differences of hundreds or even thousands of nanonseconds for context
switching.


Well speaking of context switch microbenchmarks...

I recently looked at lmbench but was disatisfied with the way it measures.Specifically, I want to see how context switch times scale as you add lotsof threads that are running concurrently. The #procs argument to lat_ctxdoes not run these processes concurrently. They each are woken in turn asa token passes through a chain of pipes.

I wrote a simple tool that does a given number of switches with a givennumber of processes. I then simply time to the total execution with'time'. This avoids the overhead of pipes, sleep/wakeup, and othercomplexities. Instead, it uses sched_yield(). The tool is available at:

http://people.freebsd.org/~jeff/yield.c and yield.sh is what I have beenusing to measure.

I found that ule on UP was 10% slower than 4BSD at 1 and 10concurrent threads and 5% slower at 100. It broke even at 1000 and wasabout 22% faster at 5,000. Then I wrote:


http://people.freebsd.org/~jeff/ulefaster.diff

This is indistinguishable from 4bsd at 1, 10, 100, and 1000 threads whilebeing 24% faster at 5,000. The 5,000 case is anomolous. I think after100 we must no longer fit in cache. At 5,000 the time to fork() andwait() actually shows up significantly. Here's output for 4BSD on UP:


        5.69 real         1.17 user         4.48 sys
        7.66 real         1.60 user         6.02 sys
        8.37 real         1.90 user         6.43 sys
       37.96 real        14.28 user        23.26 sys
       68.50 real        14.16 user        45.20 sys

And ULE with the above patch:

        5.62 real         1.23 user         4.36 sys
        7.73 real         1.97 user         5.74 sys
        8.34 real         2.01 user         6.30 sys
       38.00 real        13.60 user        24.20 sys
       52.42 real        13.84 user        38.32 sys

I did multiple runs but didn't average them. They always ended up in thesame ballpark and the patch made such a significant change that I didn'tbother to record and analyze multiple runs.

On SMP ULE pays a price for the per-cpu run queue locks. How well doesthat pay off? Here's ULE on an 8 core opteron:


        3.91 real         0.35 user         3.55 sys
        1.70 real         0.44 user         6.63 sys
        1.25 real         1.77 user         8.10 sys
        4.49 real        14.46 user        21.43 sys
       14.32 real        25.58 user        88.07 sys

And 4BSD on the same:
       39.38 real         0.59 user        38.77 sys
       62.47 real         0.84 user       493.07 sys
       66.42 real        12.23 user       517.77 sys
       69.38 real        25.13 user       523.52 sys
      131.33 real        33.33 user       930.52 sys

The combination of reduced scheduler locking and improved cache affinitypays off at about 10x the switch throughput of 4BSD. The actualcost of the extra synchronization in ULE is about a 5% penalty as measuredwith smp.disabled = 1, however, I lost that data and am not interested inrebooting 3 more times to reclaim it.


Cheers,
Jeff


ULE may give higher priority to interactive processes, but most loss of
interactivity is caused by blocking on I/O, and there is nothing nothing
a scheduler can do to speed up slow or overloaded devices.

Bruce

_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/cvs-all
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: cvs commit: src/sys/kern sched_ule.c

Reply via email to