On Wed, 22 Feb 2017, Jacob Champion wrote:
To make results less confusing, any specific patches/branch I should
test? My baseline is httpd-2.4.25 + httpd-2.4.25-deps
--with-included-apr FWIW.
2.4.25 is just fine. We'll have to make sure there's nothing substantially
different about it performance-wise before we backport patches anyway, so
it'd be good to start testing it now.
OK.
- The OpenSSL test server, writing from memory: 1.2 GiB/s
- httpd trunk with `EnableMMAP on` and serving from disk: 850 MiB/s
- httpd trunk with 'EnableMMAP off': 580 MiB/s
- httpd trunk with my no-mmap-64K-block file bucket: 810 MiB/s
At those speeds your results might be skewed by the latency of
processing 10 MiB GET:s.
Maybe, but keep in mind I care more about the difference between the numbers
than the absolute throughput ceiling here. (In any case, I don't see
significantly different numbers between 10 MiB and 1 GiB files. Remember, I'm
testing via loopback.)
Ah, right.
Discard the results from the first warm-up
access and your results delivering from memory or disk (cache) shouldn't
differ.
Ah, but they *do*, as Yann pointed out earlier. We can't just deliver the
disk cache to OpenSSL for encryption; it has to be copied into some
addressable buffer somewhere. That seems to be a major reason for the mmap()
advantage, compared to a naive read() solution that just reads into a small
buffer over and over again.
(I am trying to set up Valgrind to confirm where the test server is spending
most of its time, but it doesn't care for the large in-memory static buffer,
or for OpenSSL's compressed debugging symbols, and crashes. :( )
Any joy with something simpler like gprof? (Caveat: haven't used it in
ages to I don't know if its even applicable nowadays).
Numbers on the "memcopy penalty" would indeed be interesting,
especially any variation when the block size differs.
As I said, our live server does 600 MB/s aes-128-gcm and can deliver 300
MB/s https without mmap. That's only a factor 2 difference between
aes-128-gcm speed and delivered speed.
Your results above are almost a factor 4 off, so something's fishy :-)
Well, I can only report my methodology and numbers -- whether the numbers are
actually meaningful has yet to be determined. ;D More testers are welcome!
:-)
I did some repeated tests and my initial results were actually a bit
on the low side:
Server CPU is an Intel E5606 (1st gen aes offload), openssl speed -evp
says:
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-gcm 208536.05k 452980.05k 567523.33k 607578.11k 619192.32k
Single-stream https over a 10 Gbps link with 3ms RTT (useful routing
SNAFU when talking to stuff in the neigboring building traffic takes
the "shortcut" through a town 300 km away ;).
Using wget -O /dev/null as a client, on a host with Intel E5-2630 CPU
(960-ish MB/s aes-128-gcm on 8k blocks).
http (sendfile): 1.07 GB/s (repeatedly)
httpd (no mmap): 370-380 MB/s
openssl s_server: 330-340 MB/s
So httpd isn't beat by the naive openssl s_server approach at least
;-)
Going off on a tangent here:
For those of you who actually know how the ssl stuff really works, is
it possible to get multiple threads involved in doing the encryption,
or do you need the results from the previous block in order to do the
next one? Yes, I know this wouldn't make sense for most real setups
but for a student computer club with old hardware and good
connectivity this is a real problem ;-)
On the other hand, you would need it to do 100 Gbps single-stream
https even on latest&greatest CPUs 8-)
/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | ni...@acc.umu.se
---------------------------------------------------------------------------
There may be a correlation between humor and sex. - Data
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=