On Mon, 1 Oct 2007, Bruce Evans wrote:

On Sun, 30 Sep 2007, Jeff Roberson wrote:

On Sat, 29 Sep 2007, Kevin Oberman wrote:

YMMV, but ULE seems to generally work better then 4BSD for interactive
uniprocessor systems. The preferred scheduler for uniprocessor servers
is less clear, but many test have shown ULE does better for those
systems in the majority of cases.

I feel it's safe to say desktop behavior on UP is definitely superior.

This is unsafe to say.

I think there is no significant difference on UP between 4BSD and ULE

This may be safe to say, but is inconsistent with the above.

except perhaps in context switching microbenchmarks where ULE falls behind.

It is safe to say that interactive users cannot notice insignificant
differences.  It takes a micro-benchmark to notice possibly-significant
differences of hundreds or even thousands of nanonseconds for context
switching.

Well speaking of context switch microbenchmarks...

I recently looked at lmbench but was disatisfied with the way it measures. Specifically, I want to see how context switch times scale as you add lots of threads that are running concurrently. The #procs argument to lat_ctx does not run these processes concurrently. They each are woken in turn as a token passes through a chain of pipes.

I wrote a simple tool that does a given number of switches with a given number of processes. I then simply time to the total execution with 'time'. This avoids the overhead of pipes, sleep/wakeup, and other complexities. Instead, it uses sched_yield(). The tool is available at:

http://people.freebsd.org/~jeff/yield.c and yield.sh is what I have been using to measure.

I found that ule on UP was 10% slower than 4BSD at 1 and 10 concurrent threads and 5% slower at 100. It broke even at 1000 and was about 22% faster at 5,000. Then I wrote:

http://people.freebsd.org/~jeff/ulefaster.diff

This is indistinguishable from 4bsd at 1, 10, 100, and 1000 threads while being 24% faster at 5,000. The 5,000 case is anomolous. I think after 100 we must no longer fit in cache. At 5,000 the time to fork() and wait() actually shows up significantly. Here's output for 4BSD on UP:

        5.69 real         1.17 user         4.48 sys
        7.66 real         1.60 user         6.02 sys
        8.37 real         1.90 user         6.43 sys
       37.96 real        14.28 user        23.26 sys
       68.50 real        14.16 user        45.20 sys

And ULE with the above patch:

        5.62 real         1.23 user         4.36 sys
        7.73 real         1.97 user         5.74 sys
        8.34 real         2.01 user         6.30 sys
       38.00 real        13.60 user        24.20 sys
       52.42 real        13.84 user        38.32 sys

I did multiple runs but didn't average them. They always ended up in the same ballpark and the patch made such a significant change that I didn't bother to record and analyze multiple runs.

On SMP ULE pays a price for the per-cpu run queue locks. How well does that pay off? Here's ULE on an 8 core opteron:

        3.91 real         0.35 user         3.55 sys
        1.70 real         0.44 user         6.63 sys
        1.25 real         1.77 user         8.10 sys
        4.49 real        14.46 user        21.43 sys
       14.32 real        25.58 user        88.07 sys

And 4BSD on the same:
       39.38 real         0.59 user        38.77 sys
       62.47 real         0.84 user       493.07 sys
       66.42 real        12.23 user       517.77 sys
       69.38 real        25.13 user       523.52 sys
      131.33 real        33.33 user       930.52 sys

The combination of reduced scheduler locking and improved cache affinity pays off at about 10x the switch throughput of 4BSD. The actual cost of the extra synchronization in ULE is about a 5% penalty as measured with smp.disabled = 1, however, I lost that data and am not interested in rebooting 3 more times to reclaim it.

Cheers,
Jeff


ULE may give higher priority to interactive processes, but most loss of
interactivity is caused by blocking on I/O, and there is nothing nothing
a scheduler can do to speed up slow or overloaded devices.

Bruce

_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/cvs-all
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to