On 05/13/15 10:27, David Chisnall wrote:
On 13 May 2015, at 09:03, John-Mark Gurney <j...@funkthat.com> wrote:

Poul-Henning Kamp wrote this message on Tue, May 12, 2015 at 06:31 +0000:
In message <20150512032307.gp37...@funkthat.com>, John-Mark Gurney writes:

Also, you'd probably see even better performance by increasing the
size to 64k, [...]

        8K on 32bit
        64k on 64bit

Sounds good to me...  Just for people who care... I did a quick set of
benchmarks on sha256.. This is using my preliminary patch to use sse4
optimized sha256...  But this should be the same for others...

The numbers in ministat output are the time in seconds it takes my
3.4GHz AMD A10-5700 APU running HEAD to process a 512MB file, so lower
numbers are better..  I've processed them into easier to read format:
BUFSIZ: 145MB/sec
8k:     193MB/sec
16k:    198MB/sec
64k:    202MB/sec
128k:   202MB/sec
-t:     211MB/sec

It looks like most of the benefit is gained at 16KB.  Did you try running the 
benchmark with something else running at the same time to see if there is any 
advantage in trashing the caches a bit less (simple case, what happens if you 
run two instances of the same benchmark at once)?

I suspect that you’re about right anyway - I recently did some tests while 
playing with JavaScript FFI generation with a multithreaded process JavaScript 
environment calling out to OpenSSL to do SHA calculations and having each of 8 
threads reading in 128KB chunks gave the fastest performance (Core i7, 4 cores 
+ hyperthreading), with only a negligible gain over 64KB.  In all cases, the 
JavaScript implementation was significantly faster than the openssl tool, which 
used 8KB buffers.


You should also try this using an USB disk. The performance numbers heavily depends on the hardware's interrupt moderation values.


freebsd-current@freebsd.org mailing list
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to