Adrian Chadd wrote this message on Wed, May 13, 2015 at 08:34 -0700:
> The reason I ask about "why is it faster?" is because for embedded-y
> things with low RAM we may not want that to happen due to memory
> constraints. However, we may actually want to do some form of
> autotuning on some platforms.

If you're already running a program, the difference between 1k and
8k isn't significant... I'll give you 64k can be significant for
embedded-y platforms...  But this goes back to the, we need a global
knob saying I want low memory usage, and I am willing to pay for it
in performance...

> So, if it's underlying block size, maybe BUFSIZ isn't the thing to
> tweak, but based on disk io buffer size.
> If it's filling L1 or L2 cache with useful work, maybe auto-tune it
> based on that.

I'm pretty sure this is just simply, syscalls+copies are expensive,
and larger block sizes reduces the number of calls, going from 1k to
64k means 64 times less syscalls...

So, in my benchmark, we went from 148271 syscalls/second to 3228
syscalls/second for 64k block size, and we got a 40% perf increase on
top of this...  i.e. we spend ~40% of the cpu time to do 145k syscalls
instead of doing real work...

> Please don't take this as bikeshedding, I'd really like to see some
> "this is why it's faster" analysis rather than just numbers thrown
> around.

I don't really see a need to analyize this any more... We are batching
work in a more effecient manner...  I could list many other examples
of where we do similar optimizations...

  John-Mark Gurney                              Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."
_______________________________________________ mailing list
To unsubscribe, send any mail to ""

Reply via email to