On Fri, Jun 24, 2011 at 03:36, Robert Dionne <[email protected]> wrote: > This is interesting work, I notice some substantial changes to couch_btree, a > new query_modify_raw, etc.. > > I'm wondering though if we'd be better off to base these changes on the re > factored version of couch_btree that davisp has[1]. I haven't looked at it > too closely or tested with it but if I recall the goal was first to achieve > a more readable version with identical semantics so that we could then move > forward with improvements. > > > [1] > https://github.com/davisp/couchdb/commit/37c1c9b4b90f6c0f3c22b75dfb2ae55c8b708ab1 > >
I think the only thing holding that back was a good benchmarking. Can we throw these new benchmarks at that branch? > > > On Jun 24, 2011, at 6:06 AM, Filipe David Manana wrote: > >> Thanks Adam. >> >> Don't get too scared :) Ignore the commit history and just look at >> github's "Files changed" tab, the modification summary is: >> >> "Showing 19 changed files with 730 additions and 402 deletions." >> >> More than half of those commits were merges with trunk, many snappy >> refactorings (before it was added to trunk) and other experiments that >> were reverted after. >> We'll try to break this into 2 or 3 patches. >> >> So the single patch is something relatively small: >> https://github.com/fdmanana/couchdb/compare/async_file_writes_no_test.diff >> >> On Fri, Jun 24, 2011 at 4:05 AM, Adam Kocoloski <[email protected]> wrote: >>> Hi Damien, I'd like to see these 220 commits rebased into a set of logical >>> patches against trunk. It'll make the review easier and will help future >>> devs track down any bugs that are introduced. Best, >>> >>> Adam >>> >>> On Jun 23, 2011, at 6:49 PM, Damien Katz wrote: >>> >>>> Hi everyone, >>>> >>>> As it’s known by many of you, Filipe and I have been working on improving >>>> performance, specially write performance [1]. This work has been public in >>>> the Couchbase github account since the beginning, and the non Couchbase >>>> specific changes are now isolated in [2] and [3]. >>>> In [3] there’s an Erlang module that is used to test the performance when >>>> writing and updating batches of documents with concurrency, which was >>>> used, amongst other tools, to measure the performance gains. This module >>>> bypasses the network stack and the JSON parsing, so that basically it >>>> allows us to see more easily how significant the changes in couch_file, >>>> couch_db and couch_db_updater are. >>>> >>>> The main and most important change is asynchronous writes. The file module >>>> no longer blocks callers until the write calls complete. Instead they >>>> immediately reply to the caller with the position in the file where the >>>> data is going to be written to. The data is then sent to a dedicated loop >>>> process that is continuously writing the data it receives, from the >>>> couch_file gen_server, to disk (and batching when possible). This allows >>>> callers (such as the db updater for.e.g.) to issue write calls and keep >>>> doing other work (preparing documents, etc) while the writes are being >>>> done in parallel. After issuing all the writes, callers simply call the >>>> new ‘flush’ function in the couch_file gen_server, which will block the >>>> caller until everything was effectively written to disk - normally this >>>> flush call ends up not blocking the caller or it blocks it for a very >>>> small period. >>>> >>>> There are other changes such as avoiding 2 btree lookups per document ID >>>> (COUCHDB-1084 [4]), faster sorting in the updater (O(n log n) vs O(n^2)) >>>> and avoid sorting already sorted lists in the updater. >>>> >>>> Checking if attachments are compressible was also moved into a new >>>> module/process. We verified this took much CPU time when all or most of >>>> the documents to write/update have attachments - building the regexps and >>>> matching against them for every single attachment is surprisingly >>>> expensive. >>>> >>>> There’s also a new couch_db:update_doc/s flag named ‘optimistic’ which >>>> basically changes the behaviour to write the document bodies before >>>> entering the updater and skip some attachment related checks (duplicated >>>> names for e.g.). This flag is not yet exposed to the HTTP api, but it >>>> could be via an X-Optimistic-Write header in the doc PUT/POST requests and >>>> _bulk_docs for e.g. We’ve seen this as good when the client knows that the >>>> documents to write don’t exist yet in the database and we aren’t already >>>> IO bound, such as when SSDs are used. >>>> >>>> We used relaximation, Filipe’s basho bench based tests [5] and the Erlang >>>> test module mentioned before [6, 7], exposed via the HTTP . Here follow >>>> some benchmark results. >>>> >>>> >>>> # Using the Erlang test module (test output) >>>> >>>> ## 1Kb documents, 10 concurrent writers, batches of 500 docs >>>> >>>> trunk before snappy was added: >>>> >>>> {"db":"load_test","total":100000,"batch":500,"concurrency":10,"rounds":10,"delayed_commits":false,"optimistic":false,"total_time_ms":270071} >>>> >>>> trunk: >>>> >>>> {"db":"load_test","total":100000,"batch":500,"concurrency":10,"rounds":10,"delayed_commits":false,"optimistic":false,"total_time_ms":157328} >>>> >>>> trunk + async writes (and snappy): >>>> >>>> {"db":"load_test","total":100000,"batch":500,"concurrency":10,"rounds":10,"delayed_commits":false,"optimistic":false,"total_time_ms":121518} >>>> >>>> ## 2.5Kb documents, 10 concurrent writers, batches of 500 docs >>>> >>>> trunk before snappy was added: >>>> >>>> {"db":"load_test","total":100000,"batch":500,"concurrency":10,"rounds":10,"delayed_commits":false,"optimistic":false,"total_time_ms":507098} >>>> >>>> trunk: >>>> >>>> {"db":"load_test","total":100000,"batch":500,"concurrency":10,"rounds":10,"delayed_commits":false,"optimistic":false,"total_time_ms":230391} >>>> >>>> trunk + async writes (and snappy): >>>> >>>> {"db":"load_test","total":100000,"batch":500,"concurrency":10,"rounds":10,"delayed_commits":false,"optimistic":false,"total_time_ms":190151} >>>> >>>> >>>> # bash bench tests, via the public HTTP APIs >>>> >>>> ## batches of 1 1Kb docs, 50 writers, 5 minutes run >>>> >>>> trunk: 147 702 docs written >>>> branch: 149 534 docs written >>>> >>>> ## batches of 10 1Kb docs, 50 writers, 5 minutes run >>>> >>>> trunk: 878 520 docs written >>>> branch: 991 330 docs written >>>> >>>> ## batches of 100 1Kb docs, 50 writers, 5 minutes run >>>> >>>> trunk: 1 627 600 docs written >>>> branch: 1 865 800 docs written >>>> >>>> ## batches of 1 2.5Kb docs, 50 writers, 5 minutes run >>>> >>>> trunk: 142 531 docs written >>>> branch: 143 012 docs written >>>> >>>> ## batches of 10 2.5Kb docs, 50 writers, 5 minutes run >>>> >>>> trunk: 724 880 docs written >>>> branch: 780 690 docs written >>>> >>>> ## batches of 100 2.5Kb docs, 50 writers, 5 minutes run >>>> >>>> trunk: 1 028 600 docs written >>>> branch: 1 152 800 docs written >>>> >>>> >>>> # bash bench tests, via the internal Erlang APIs >>>> ## batches of 100 2.5Kb docs, 50 writers, 5 minutes run >>>> >>>> trunk: 3 170 100 docs written >>>> branch: 3 359 900 docs written >>>> >>>> >>>> # Relaximation tests >>>> >>>> 1Kb docs: >>>> >>>> http://graphs.mikeal.couchone.com/#/graph/4843dbdf8fa104783870094b83002a1a >>>> >>>> 2.5Kb docs: >>>> >>>> http://graphs.mikeal.couchone.com/#/graph/4843dbdf8fa104783870094b830022c0 >>>> >>>> 4Kb docs: >>>> >>>> http://graphs.mikeal.couchone.com/#/graph/4843dbdf8fa104783870094b8300330d >>>> >>>> >>>> All the documents used for these tests can be found at: >>>> https://github.com/fdmanana/basho_bench_couch/tree/master/couch_docs >>>> >>>> >>>> Now some view indexing tests. >>>> >>>> # indexer_test_2 database >>>> (http://fdmanana.couchone.com/_utils/database.html?indexer_test_2) >>>> >>>> ## trunk >>>> >>>> $ time curl >>>> http://localhost:5984/indexer_test_2/_design/test/_view/view1?limit=1 >>>> {"total_rows":1102400,"offset":0,"rows":[ >>>> {"id":"00d49881-7bcf-4c3d-a65d-e44435eeb513","key":["dwarf","assassin",2,1.1],"value":[{"x":174347.18,"y":127272.8},{"x":35179.93,"y":41550.55},{"x":157014.38,"y":172052.63},{"x":116185.83,"y":69871.73},{"x":153746.28,"y":190006.59}]} >>>> ]} >>>> >>>> real 20m51.388s >>>> user 0m0.040s >>>> sys 0m0.000s >>>> >>>> >>>> ## branch async writes >>>> >>>> $ time curl >>>> http://localhost:5984/indexer_test_2/_design/test/_view/view1?limit=1 >>>> {"total_rows":1102400,"offset":0,"rows":[ >>>> {"id":"00d49881-7bcf-4c3d-a65d-e44435eeb513","key":["dwarf","assassin",2,1.1],"value":[{"x":174347.18,"y":127272.8},{"x":35179.93,"y":41550.55},{"x":157014.38,"y":172052.63},{"x":116185.83,"y":69871.73},{"x":153746.28,"y":190006.59}]} >>>> ]} >>>> >>>> real 15m17.908s >>>> user 0m0.008s >>>> sys 0m0.020s >>>> >>>> >>>> # indexer_test_3_database >>>> (http://fdmanana.couchone.com/_utils/database.html?indexer_test_3) >>>> >>>> ## trunk >>>> >>>> $ time curl >>>> http://localhost:5984/indexer_test_3/_design/test/_view/view1?limit=1 >>>> {"total_rows":1102400,"offset":0,"rows":[ >>>> {"id":"00d49881-7bcf-4c3d-a65d-e44435eeb513","key":["dwarf","assassin",2,1.1],"value":[{"x":174347.18,"y":127272.8},{"x":35179.93,"y":41550.55},{"x":157014.38,"y":172052.63},{"x":116185.83,"y":69871.73},{"x":153746.28,"y":190006.59}]} >>>> ]} >>>> >>>> real 21m17.346s >>>> user 0m0.012s >>>> sys 0m0.028s >>>> >>>> ## branch async writes >>>> >>>> $ time curl >>>> http://localhost:5984/indexer_test_3/_design/test/_view/view1?limit=1 >>>> {"total_rows":1102400,"offset":0,"rows":[ >>>> {"id":"00d49881-7bcf-4c3d-a65d-e44435eeb513","key":["dwarf","assassin",2,1.1],"value":[{"x":174347.18,"y":127272.8},{"x":35179.93,"y":41550.55},{"x":157014.38,"y":172052.63},{"x":116185.83,"y":69871.73},{"x":153746.28,"y":190006.59}]} >>>> ]} >>>> >>>> real 16m28.558s >>>> user 0m0.012s >>>> sys 0m0.020s >>>> >>>> We don’t show nearly as big of improvements for single write per request >>>> benchmarks as we do with bulk writes. This is due to the HTTP request >>>> overhead and our own inefficiencies at that layer. We have lots of room >>>> yet for optimizations at the networking layer. >>>> >>>> We'd like to merge this code into trunk next week by next wednesday. >>>> Please respond with any improvement, objections or comments by then. >>>> Thanks! >>>> >>>> -Damien >>>> >>>> >>>> [1] - >>>> http://blog.couchbase.com/driving-performance-improvements-couchbase-single-server-two-dot-zero >>>> [2] - https://github.com/fdmanana/couchdb/compare/async_file_writes_no_test >>>> [3] - https://github.com/fdmanana/couchdb/compare/async_file_writes >>>> [4] - https://issues.apache.org/jira/browse/COUCHDB-1084 >>>> [5] - https://github.com/fdmanana/basho_bench_couch >>>> [6] - >>>> https://github.com/fdmanana/couchdb/blob/async_file_writes/gen_load.sh >>>> [7] - >>>> https://github.com/fdmanana/couchdb/blob/async_file_writes/src/couchdb/couch_internal_load_gen.erl >>> >>> >> >> >> >> -- >> Filipe David Manana, >> [email protected], [email protected] >> >> "Reasonable men adapt themselves to the world. >> Unreasonable men adapt the world to themselves. >> That's why all progress depends on unreasonable men." > >
