In message <[EMAIL PROTECTED]>, "Neil A. Carson" writes:
>It's interesting; in the past when doing these benchmarks, things like
>the pipe throughput and local latencies became totally shadowed by ctx
>sw times. I remember once a test where the remote latency was smaller
>than the local one :-)

Yeah.  None of the lmbench programs I looked at make any attempt to factor out 
the context switch time from what they're measuring.  The tough bit is that 
faced with results like:

Host                 OS 2p/0K  Pipe AF     UDP  RPC/   TCP  RPC/ TCP
                        ctxsw       UNIX         UDP         TCP conn
armv4l-li   Linux 2.3.9   159   325  632   598         770       1750

it's hard to know how much of the blame for the 632us AF_UNIX latency you can 
lay at the door of context switching, and how much you are picking up 
elsewhere.

BTW, there is of course another bad effect of the ARM cache that the 
comments in the code don't mention.  Because it's virtual mapped you have to 
flush the whole thing on every context switch -- even if you could do this 
instantaneously, as you pretty much can on non-StrongARM machines, both the 
kernel and the newly-running user program are starting out with completely 
cold caches, so you get desperately bad performance for a while as everything 
gets reloaded from main memory.  This might actually make it worthwhile to 
think about adding to the ARM6/7 support some of the code to work out when 
we can get away without flushing the cache and TLB.  I will investigate that 
when I can find some time.

p.


unsubscribe: body of `unsubscribe linux-arm' to [EMAIL PROTECTED]

Reply via email to