In message <[EMAIL PROTECTED]>, "Neil A. Carson" writes:
>It's interesting; in the past when doing these benchmarks, things like
>the pipe throughput and local latencies became totally shadowed by ctx
>sw times. I remember once a test where the remote latency was smaller
>than the local one :-)
Yeah. None of the lmbench programs I looked at make any attempt to factor out
the context switch time from what they're measuring. The tough bit is that
faced with results like:
Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP
ctxsw UNIX UDP TCP conn
armv4l-li Linux 2.3.9 159 325 632 598 770 1750
it's hard to know how much of the blame for the 632us AF_UNIX latency you can
lay at the door of context switching, and how much you are picking up
elsewhere.
BTW, there is of course another bad effect of the ARM cache that the
comments in the code don't mention. Because it's virtual mapped you have to
flush the whole thing on every context switch -- even if you could do this
instantaneously, as you pretty much can on non-StrongARM machines, both the
kernel and the newly-running user program are starting out with completely
cold caches, so you get desperately bad performance for a while as everything
gets reloaded from main memory. This might actually make it worthwhile to
think about adding to the ARM6/7 support some of the code to work out when
we can get away without flushing the cache and TLB. I will investigate that
when I can find some time.
p.
unsubscribe: body of `unsubscribe linux-arm' to [EMAIL PROTECTED]