Re: New write performance optimizations coming

Randall Leeds Fri, 24 Jun 2011 10:01:10 -0700

On Fri, Jun 24, 2011 at 03:36, Robert Dionne
<[email protected]> wrote:
> This is interesting work, I notice some substantial changes to couch_btree, a 
> new query_modify_raw, etc..
>
> I'm wondering though if we'd be better off to base these changes on the re 
> factored version of couch_btree that davisp has[1]. I haven't looked at it 
> too closely or tested with it but if I recall the goal was first to achieve
> a more readable version with identical semantics so that we could then move 
> forward with improvements.
>
>
> [1] 
> https://github.com/davisp/couchdb/commit/37c1c9b4b90f6c0f3c22b75dfb2ae55c8b708ab1
>
>


I think the only thing holding that back was a good benchmarking.
Can we throw these new benchmarks at that branch?

>
>
> On Jun 24, 2011, at 6:06 AM, Filipe David Manana wrote:
>
>> Thanks Adam.
>>
>> Don't get too scared :) Ignore the commit history and just look at
>> github's "Files changed" tab, the modification summary is:
>>
>> "Showing 19 changed files with 730 additions and 402 deletions."
>>
>> More than half of those commits were merges with trunk, many snappy
>> refactorings (before it was added to trunk) and other experiments that
>> were reverted after.
>> We'll try to break this into 2 or 3 patches.
>>
>> So the single patch is something relatively small:
>> https://github.com/fdmanana/couchdb/compare/async_file_writes_no_test.diff
>>
>> On Fri, Jun 24, 2011 at 4:05 AM, Adam Kocoloski <[email protected]> wrote:
>>> Hi Damien, I'd like to see these 220 commits rebased into a set of logical 
>>> patches against trunk.  It'll make the review easier and will help future 
>>> devs track down any bugs that are introduced.  Best,
>>>
>>> Adam
>>>
>>> On Jun 23, 2011, at 6:49 PM, Damien Katz wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> As it’s known by many of you, Filipe and I have been working on improving 
>>>> performance, specially write performance [1]. This work has been public in 
>>>> the Couchbase github account since the beginning, and the non Couchbase 
>>>> specific changes are now isolated in [2] and [3].
>>>> In [3] there’s an Erlang module that is used to test the performance when 
>>>> writing and updating batches of documents with concurrency, which was 
>>>> used, amongst other tools, to measure the performance gains. This module 
>>>> bypasses the network stack and the JSON parsing, so that basically it 
>>>> allows us to see more easily how significant the changes in couch_file, 
>>>> couch_db and couch_db_updater are.
>>>>
>>>> The main and most important change is asynchronous writes. The file module 
>>>> no longer blocks callers until the write calls complete. Instead they 
>>>> immediately reply to the caller with the position in the file where the 
>>>> data is going to be written to. The data is then sent to a dedicated loop 
>>>> process that is continuously writing the data it receives, from the 
>>>> couch_file gen_server, to disk (and batching when possible). This allows 
>>>> callers (such as the db updater for.e.g.) to issue write calls and keep 
>>>> doing other work (preparing documents, etc) while the writes are being 
>>>> done in parallel. After issuing all the writes, callers simply call the 
>>>> new ‘flush’ function in the couch_file gen_server, which will block the 
>>>> caller until everything was effectively written to disk - normally this 
>>>> flush call ends up not blocking the caller or it blocks it for a very 
>>>> small period.
>>>>
>>>> There are other changes such as avoiding 2 btree lookups per document ID 
>>>> (COUCHDB-1084 [4]), faster sorting in the updater (O(n log n) vs O(n^2)) 
>>>> and avoid sorting already sorted lists in the updater.
>>>>
>>>> Checking if attachments are compressible was also moved into a new 
>>>> module/process. We verified this took much CPU time when all or most of 
>>>> the documents to write/update have attachments - building the regexps and 
>>>> matching against them for every single attachment is surprisingly 
>>>> expensive.
>>>>
>>>> There’s also a new couch_db:update_doc/s flag named ‘optimistic’ which 
>>>> basically changes the behaviour to write the document bodies before 
>>>> entering the updater and skip some attachment related checks (duplicated 
>>>> names for e.g.). This flag is not yet exposed to the HTTP api, but it 
>>>> could be via an X-Optimistic-Write header in the doc PUT/POST requests and 
>>>> _bulk_docs for e.g. We’ve seen this as good when the client knows that the 
>>>> documents to write don’t exist yet in the database and we aren’t already 
>>>> IO bound, such as when SSDs are used.
>>>>
>>>> We used relaximation, Filipe’s basho bench based tests [5] and the Erlang 
>>>> test module mentioned before [6, 7], exposed via the HTTP . Here follow 
>>>> some benchmark results.
>>>>
>>>>
>>>> # Using the Erlang test module (test output)
>>>>
>>>> ## 1Kb documents, 10 concurrent writers, batches of 500 docs
>>>>
>>>> trunk before snappy was added:
>>>>
>>>> {"db":"load_test","total":100000,"batch":500,"concurrency":10,"rounds":10,"delayed_commits":false,"optimistic":false,"total_time_ms":270071}
>>>>
>>>> trunk:
>>>>
>>>> {"db":"load_test","total":100000,"batch":500,"concurrency":10,"rounds":10,"delayed_commits":false,"optimistic":false,"total_time_ms":157328}
>>>>
>>>> trunk + async writes (and snappy):
>>>>
>>>> {"db":"load_test","total":100000,"batch":500,"concurrency":10,"rounds":10,"delayed_commits":false,"optimistic":false,"total_time_ms":121518}
>>>>
>>>> ## 2.5Kb documents, 10 concurrent writers, batches of 500 docs
>>>>
>>>> trunk before snappy was added:
>>>>
>>>> {"db":"load_test","total":100000,"batch":500,"concurrency":10,"rounds":10,"delayed_commits":false,"optimistic":false,"total_time_ms":507098}
>>>>
>>>> trunk:
>>>>
>>>> {"db":"load_test","total":100000,"batch":500,"concurrency":10,"rounds":10,"delayed_commits":false,"optimistic":false,"total_time_ms":230391}
>>>>
>>>> trunk + async writes (and snappy):
>>>>
>>>> {"db":"load_test","total":100000,"batch":500,"concurrency":10,"rounds":10,"delayed_commits":false,"optimistic":false,"total_time_ms":190151}
>>>>
>>>>
>>>> # bash bench tests, via the public HTTP APIs
>>>>
>>>> ## batches of 1 1Kb docs, 50 writers, 5 minutes run
>>>>
>>>> trunk:     147 702 docs written
>>>> branch:  149 534 docs written
>>>>
>>>> ## batches of 10 1Kb docs, 50 writers, 5 minutes run
>>>>
>>>> trunk:     878 520 docs written
>>>> branch:  991 330 docs written
>>>>
>>>> ## batches of 100 1Kb docs, 50 writers, 5 minutes run
>>>>
>>>> trunk:    1 627 600 docs written
>>>> branch: 1 865 800 docs written
>>>>
>>>> ## batches of 1 2.5Kb docs, 50 writers, 5 minutes run
>>>>
>>>> trunk:    142 531 docs written
>>>> branch: 143 012 docs written
>>>>
>>>> ## batches of 10 2.5Kb docs, 50 writers, 5 minutes run
>>>>
>>>> trunk:     724 880 docs written
>>>> branch:   780 690 docs written
>>>>
>>>> ## batches of 100 2.5Kb docs, 50 writers, 5 minutes run
>>>>
>>>> trunk:      1 028 600 docs written
>>>> branch:   1 152 800 docs written
>>>>
>>>>
>>>> # bash bench tests, via the internal Erlang APIs
>>>> ## batches of 100 2.5Kb docs, 50 writers, 5 minutes run
>>>>
>>>> trunk:    3 170 100 docs written
>>>> branch: 3 359 900 docs written
>>>>
>>>>
>>>> # Relaximation tests
>>>>
>>>> 1Kb docs:
>>>>
>>>> http://graphs.mikeal.couchone.com/#/graph/4843dbdf8fa104783870094b83002a1a
>>>>
>>>> 2.5Kb docs:
>>>>
>>>> http://graphs.mikeal.couchone.com/#/graph/4843dbdf8fa104783870094b830022c0
>>>>
>>>> 4Kb docs:
>>>>
>>>> http://graphs.mikeal.couchone.com/#/graph/4843dbdf8fa104783870094b8300330d
>>>>
>>>>
>>>> All the documents used for these tests can be found at:  
>>>> https://github.com/fdmanana/basho_bench_couch/tree/master/couch_docs
>>>>
>>>>
>>>> Now some view indexing tests.
>>>>
>>>> # indexer_test_2 database 
>>>> (http://fdmanana.couchone.com/_utils/database.html?indexer_test_2)
>>>>
>>>> ## trunk
>>>>
>>>> $ time curl 
>>>> http://localhost:5984/indexer_test_2/_design/test/_view/view1?limit=1
>>>> {"total_rows":1102400,"offset":0,"rows":[
>>>> {"id":"00d49881-7bcf-4c3d-a65d-e44435eeb513","key":["dwarf","assassin",2,1.1],"value":[{"x":174347.18,"y":127272.8},{"x":35179.93,"y":41550.55},{"x":157014.38,"y":172052.63},{"x":116185.83,"y":69871.73},{"x":153746.28,"y":190006.59}]}
>>>> ]}
>>>>
>>>> real  20m51.388s
>>>> user  0m0.040s
>>>> sys   0m0.000s
>>>>
>>>>
>>>> ## branch async writes
>>>>
>>>> $ time curl 
>>>> http://localhost:5984/indexer_test_2/_design/test/_view/view1?limit=1
>>>> {"total_rows":1102400,"offset":0,"rows":[
>>>> {"id":"00d49881-7bcf-4c3d-a65d-e44435eeb513","key":["dwarf","assassin",2,1.1],"value":[{"x":174347.18,"y":127272.8},{"x":35179.93,"y":41550.55},{"x":157014.38,"y":172052.63},{"x":116185.83,"y":69871.73},{"x":153746.28,"y":190006.59}]}
>>>> ]}
>>>>
>>>> real  15m17.908s
>>>> user  0m0.008s
>>>> sys   0m0.020s
>>>>
>>>>
>>>> # indexer_test_3_database 
>>>> (http://fdmanana.couchone.com/_utils/database.html?indexer_test_3)
>>>>
>>>> ## trunk
>>>>
>>>> $ time curl 
>>>> http://localhost:5984/indexer_test_3/_design/test/_view/view1?limit=1
>>>> {"total_rows":1102400,"offset":0,"rows":[
>>>> {"id":"00d49881-7bcf-4c3d-a65d-e44435eeb513","key":["dwarf","assassin",2,1.1],"value":[{"x":174347.18,"y":127272.8},{"x":35179.93,"y":41550.55},{"x":157014.38,"y":172052.63},{"x":116185.83,"y":69871.73},{"x":153746.28,"y":190006.59}]}
>>>> ]}
>>>>
>>>> real  21m17.346s
>>>> user  0m0.012s
>>>> sys   0m0.028s
>>>>
>>>> ## branch async writes
>>>>
>>>> $ time curl 
>>>> http://localhost:5984/indexer_test_3/_design/test/_view/view1?limit=1
>>>> {"total_rows":1102400,"offset":0,"rows":[
>>>> {"id":"00d49881-7bcf-4c3d-a65d-e44435eeb513","key":["dwarf","assassin",2,1.1],"value":[{"x":174347.18,"y":127272.8},{"x":35179.93,"y":41550.55},{"x":157014.38,"y":172052.63},{"x":116185.83,"y":69871.73},{"x":153746.28,"y":190006.59}]}
>>>> ]}
>>>>
>>>> real  16m28.558s
>>>> user  0m0.012s
>>>> sys   0m0.020s
>>>>
>>>> We don’t show nearly as big of improvements for single write per request 
>>>> benchmarks as we do with bulk writes. This is due to the HTTP request 
>>>> overhead and our own inefficiencies at that layer. We have lots of room 
>>>> yet for optimizations at the networking layer.
>>>>
>>>> We'd like to merge this code into trunk next week by next wednesday. 
>>>> Please respond with any improvement, objections or comments by then. 
>>>> Thanks!
>>>>
>>>> -Damien
>>>>
>>>>
>>>> [1] - 
>>>> http://blog.couchbase.com/driving-performance-improvements-couchbase-single-server-two-dot-zero
>>>> [2] - https://github.com/fdmanana/couchdb/compare/async_file_writes_no_test
>>>> [3] - https://github.com/fdmanana/couchdb/compare/async_file_writes
>>>> [4] - https://issues.apache.org/jira/browse/COUCHDB-1084
>>>> [5] - https://github.com/fdmanana/basho_bench_couch
>>>> [6] - 
>>>> https://github.com/fdmanana/couchdb/blob/async_file_writes/gen_load.sh
>>>> [7] - 
>>>> https://github.com/fdmanana/couchdb/blob/async_file_writes/src/couchdb/couch_internal_load_gen.erl
>>>
>>>
>>
>>
>>
>> --
>> Filipe David Manana,
>> [email protected], [email protected]
>>
>> "Reasonable men adapt themselves to the world.
>>  Unreasonable men adapt the world to themselves.
>>  That's why all progress depends on unreasonable men."
>
>

Re: New write performance optimizations coming

Reply via email to