> Am 22.02.2017 um 00:14 schrieb Jacob Champion <champio...@gmail.com>:
> 
> On 02/19/2017 01:37 PM, Niklas Edmundsson wrote:
>> On Thu, 16 Feb 2017, Jacob Champion wrote:
>>> So, I had already hacked my O_DIRECT bucket case to just be a copy of
>>> APR's file bucket, minus the mmap() logic. I tried making this change
>>> on top of it...
>>> 
>>> ...and holy crap, for regular HTTP it's *faster* than our current
>>> mmap() implementation. HTTPS is still slower than with mmap, but
>>> faster than it was without the change. (And the HTTPS performance has
>>> been really variable.)
>> 
>> I'm guessing that this is with a low-latency storage device, say a
>> local SSD with low load? O_DIRECT on anything with latency would require
>> way bigger blocks to hide the latency... You really want the OS
>> readahead in the generic case, simply because it performs reasonably
>> well in most cases.
> 
> I described my setup really poorly. I've ditched O_DIRECT entirely. The 
> bucket type I created to use O_DIRECT has been repurposed to just be a copy 
> of the APR file bucket, with the mmap optimization removed entirely, and with 
> the new 64K bucket buffer limit. This new "no-mmap-plus-64K-block" file 
> bucket type performs better on my machine than the old "mmap-enabled" file 
> bucket type.
> 
> (But yes, my testing is all local, with a nice SSD. Hopefully that gets a 
> little closer to isolating the CPU parts of this equation, which is the thing 
> we have the most influence over.)
> 
>> I think the big win here is to use appropriate block sizes, you do more
>> useful work and less housekeeping. I have no clue on when the block size
>> choices were made, but it's likely that it was a while ago. Assuming
>> that things will continue to evolve, I'd say making hard-coded numbers
>> tunable is a Good Thing to do.
> 
> Agreed.
> 
>> Is there interest in more real-life numbers with increasing
>> FILE_BUCKET_BUFF_SIZE or are you already on it?
> 
> Yes please! My laptop probably isn't representative of most servers; it can 
> do nearly 3 GB/s AES-128-GCM. The more machines we test, the better.
> 
>> I have an older server
>> that can do 600 MB/s aes-128-gcm per core, but is only able to deliver
>> 300 MB/s https single-stream via its 10 GBps interface. My guess is too
>> small blocks causing CPU cycles being spent not delivering data...
> 
> Right. To give you an idea of where I am in testing at the moment: I have a 
> basic test server written with OpenSSL. It sends a 10 MiB response body from 
> memory (*not* from disk) for every GET it receives. I also have a copy of 
> httpd trunk that's serving an actual 10 MiB file from disk.
> 
> My test call is just `h2load --h1 -n 100 https://localhost/`, which should 
> send 100 requests over a single TLS connection. The ciphersuite selected for 
> all test cases is ECDHE-RSA-AES256-GCM-SHA384. For reference, I can do 
> in-memory AES-256-GCM at 2.1 GiB/s.
> 
> - The OpenSSL test server, writing from memory: 1.2 GiB/s
> - httpd trunk with `EnableMMAP on` and serving from disk: 850 MiB/s
> - httpd trunk with 'EnableMMAP off': 580 MiB/s
> - httpd trunk with my no-mmap-64K-block file bucket: 810 MiB/s
> 
> So just bumping the block size gets me almost to the speed of mmap, without 
> the downside of a potential SIGBUS. Meanwhile, the OpenSSL test server seems 
> to suggest a performance ceiling about 50% above where we are now.
> 
> Even with the test server serving responses from memory, that seems like 
> plenty of room to grow. I'm working on a version of the test server that 
> serves files from disk so that I'm not comparing apples to oranges, but my 
> prior testing leads me to believe that disk access is not the limiting factor 
> on my machine.
> 
> --Jacob

Just so I do not misunderstand: 

you increased BUCKET_BUFF_SIZE in APR from 8000 to 64K? That is what you are 
testing?

Stefan Eissing

<green/>bytes GmbH
Hafenstrasse 16
48155 Münster
www.greenbytes.de

Reply via email to