Hi Damien, I'd like to see these 220 commits rebased into a set of logical
patches against trunk. It'll make the review easier and will help future devs
track down any bugs that are introduced. Best,
Adam
On Jun 23, 2011, at 6:49 PM, Damien Katz wrote:
> Hi everyone,
>
> As it’s known by many of you, Filipe and I have been working on improving
> performance, specially write performance [1]. This work has been public in
> the Couchbase github account since the beginning, and the non Couchbase
> specific changes are now isolated in [2] and [3].
> In [3] there’s an Erlang module that is used to test the performance when
> writing and updating batches of documents with concurrency, which was used,
> amongst other tools, to measure the performance gains. This module bypasses
> the network stack and the JSON parsing, so that basically it allows us to see
> more easily how significant the changes in couch_file, couch_db and
> couch_db_updater are.
>
> The main and most important change is asynchronous writes. The file module no
> longer blocks callers until the write calls complete. Instead they
> immediately reply to the caller with the position in the file where the data
> is going to be written to. The data is then sent to a dedicated loop process
> that is continuously writing the data it receives, from the couch_file
> gen_server, to disk (and batching when possible). This allows callers (such
> as the db updater for.e.g.) to issue write calls and keep doing other work
> (preparing documents, etc) while the writes are being done in parallel. After
> issuing all the writes, callers simply call the new ‘flush’ function in the
> couch_file gen_server, which will block the caller until everything was
> effectively written to disk - normally this flush call ends up not blocking
> the caller or it blocks it for a very small period.
>
> There are other changes such as avoiding 2 btree lookups per document ID
> (COUCHDB-1084 [4]), faster sorting in the updater (O(n log n) vs O(n^2)) and
> avoid sorting already sorted lists in the updater.
>
> Checking if attachments are compressible was also moved into a new
> module/process. We verified this took much CPU time when all or most of the
> documents to write/update have attachments - building the regexps and
> matching against them for every single attachment is surprisingly expensive.
>
> There’s also a new couch_db:update_doc/s flag named ‘optimistic’ which
> basically changes the behaviour to write the document bodies before entering
> the updater and skip some attachment related checks (duplicated names for
> e.g.). This flag is not yet exposed to the HTTP api, but it could be via an
> X-Optimistic-Write header in the doc PUT/POST requests and _bulk_docs for
> e.g. We’ve seen this as good when the client knows that the documents to
> write don’t exist yet in the database and we aren’t already IO bound, such as
> when SSDs are used.
>
> We used relaximation, Filipe’s basho bench based tests [5] and the Erlang
> test module mentioned before [6, 7], exposed via the HTTP . Here follow some
> benchmark results.
>
>
> # Using the Erlang test module (test output)
>
> ## 1Kb documents, 10 concurrent writers, batches of 500 docs
>
> trunk before snappy was added:
>
> {"db":"load_test","total":100000,"batch":500,"concurrency":10,"rounds":10,"delayed_commits":false,"optimistic":false,"total_time_ms":270071}
>
> trunk:
>
> {"db":"load_test","total":100000,"batch":500,"concurrency":10,"rounds":10,"delayed_commits":false,"optimistic":false,"total_time_ms":157328}
>
> trunk + async writes (and snappy):
>
> {"db":"load_test","total":100000,"batch":500,"concurrency":10,"rounds":10,"delayed_commits":false,"optimistic":false,"total_time_ms":121518}
>
> ## 2.5Kb documents, 10 concurrent writers, batches of 500 docs
>
> trunk before snappy was added:
>
> {"db":"load_test","total":100000,"batch":500,"concurrency":10,"rounds":10,"delayed_commits":false,"optimistic":false,"total_time_ms":507098}
>
> trunk:
>
> {"db":"load_test","total":100000,"batch":500,"concurrency":10,"rounds":10,"delayed_commits":false,"optimistic":false,"total_time_ms":230391}
>
> trunk + async writes (and snappy):
>
> {"db":"load_test","total":100000,"batch":500,"concurrency":10,"rounds":10,"delayed_commits":false,"optimistic":false,"total_time_ms":190151}
>
>
> # bash bench tests, via the public HTTP APIs
>
> ## batches of 1 1Kb docs, 50 writers, 5 minutes run
>
> trunk: 147 702 docs written
> branch: 149 534 docs written
>
> ## batches of 10 1Kb docs, 50 writers, 5 minutes run
>
> trunk: 878 520 docs written
> branch: 991 330 docs written
>
> ## batches of 100 1Kb docs, 50 writers, 5 minutes run
>
> trunk: 1 627 600 docs written
> branch: 1 865 800 docs written
>
> ## batches of 1 2.5Kb docs, 50 writers, 5 minutes run
>
> trunk: 142 531 docs written
> branch: 143 012 docs written
>
> ## batches of 10 2.5Kb docs, 50 writers, 5 minutes run
>
> trunk: 724 880 docs written
> branch: 780 690 docs written
>
> ## batches of 100 2.5Kb docs, 50 writers, 5 minutes run
>
> trunk: 1 028 600 docs written
> branch: 1 152 800 docs written
>
>
> # bash bench tests, via the internal Erlang APIs
> ## batches of 100 2.5Kb docs, 50 writers, 5 minutes run
>
> trunk: 3 170 100 docs written
> branch: 3 359 900 docs written
>
>
> # Relaximation tests
>
> 1Kb docs:
>
> http://graphs.mikeal.couchone.com/#/graph/4843dbdf8fa104783870094b83002a1a
>
> 2.5Kb docs:
>
> http://graphs.mikeal.couchone.com/#/graph/4843dbdf8fa104783870094b830022c0
>
> 4Kb docs:
>
> http://graphs.mikeal.couchone.com/#/graph/4843dbdf8fa104783870094b8300330d
>
>
> All the documents used for these tests can be found at:
> https://github.com/fdmanana/basho_bench_couch/tree/master/couch_docs
>
>
> Now some view indexing tests.
>
> # indexer_test_2 database
> (http://fdmanana.couchone.com/_utils/database.html?indexer_test_2)
>
> ## trunk
>
> $ time curl
> http://localhost:5984/indexer_test_2/_design/test/_view/view1?limit=1
> {"total_rows":1102400,"offset":0,"rows":[
> {"id":"00d49881-7bcf-4c3d-a65d-e44435eeb513","key":["dwarf","assassin",2,1.1],"value":[{"x":174347.18,"y":127272.8},{"x":35179.93,"y":41550.55},{"x":157014.38,"y":172052.63},{"x":116185.83,"y":69871.73},{"x":153746.28,"y":190006.59}]}
> ]}
>
> real 20m51.388s
> user 0m0.040s
> sys 0m0.000s
>
>
> ## branch async writes
>
> $ time curl
> http://localhost:5984/indexer_test_2/_design/test/_view/view1?limit=1
> {"total_rows":1102400,"offset":0,"rows":[
> {"id":"00d49881-7bcf-4c3d-a65d-e44435eeb513","key":["dwarf","assassin",2,1.1],"value":[{"x":174347.18,"y":127272.8},{"x":35179.93,"y":41550.55},{"x":157014.38,"y":172052.63},{"x":116185.83,"y":69871.73},{"x":153746.28,"y":190006.59}]}
> ]}
>
> real 15m17.908s
> user 0m0.008s
> sys 0m0.020s
>
>
> # indexer_test_3_database
> (http://fdmanana.couchone.com/_utils/database.html?indexer_test_3)
>
> ## trunk
>
> $ time curl
> http://localhost:5984/indexer_test_3/_design/test/_view/view1?limit=1
> {"total_rows":1102400,"offset":0,"rows":[
> {"id":"00d49881-7bcf-4c3d-a65d-e44435eeb513","key":["dwarf","assassin",2,1.1],"value":[{"x":174347.18,"y":127272.8},{"x":35179.93,"y":41550.55},{"x":157014.38,"y":172052.63},{"x":116185.83,"y":69871.73},{"x":153746.28,"y":190006.59}]}
> ]}
>
> real 21m17.346s
> user 0m0.012s
> sys 0m0.028s
>
> ## branch async writes
>
> $ time curl
> http://localhost:5984/indexer_test_3/_design/test/_view/view1?limit=1
> {"total_rows":1102400,"offset":0,"rows":[
> {"id":"00d49881-7bcf-4c3d-a65d-e44435eeb513","key":["dwarf","assassin",2,1.1],"value":[{"x":174347.18,"y":127272.8},{"x":35179.93,"y":41550.55},{"x":157014.38,"y":172052.63},{"x":116185.83,"y":69871.73},{"x":153746.28,"y":190006.59}]}
> ]}
>
> real 16m28.558s
> user 0m0.012s
> sys 0m0.020s
>
> We don’t show nearly as big of improvements for single write per request
> benchmarks as we do with bulk writes. This is due to the HTTP request
> overhead and our own inefficiencies at that layer. We have lots of room yet
> for optimizations at the networking layer.
>
> We'd like to merge this code into trunk next week by next wednesday. Please
> respond with any improvement, objections or comments by then. Thanks!
>
> -Damien
>
>
> [1] -
> http://blog.couchbase.com/driving-performance-improvements-couchbase-single-server-two-dot-zero
> [2] - https://github.com/fdmanana/couchdb/compare/async_file_writes_no_test
> [3] - https://github.com/fdmanana/couchdb/compare/async_file_writes
> [4] - https://issues.apache.org/jira/browse/COUCHDB-1084
> [5] - https://github.com/fdmanana/basho_bench_couch
> [6] - https://github.com/fdmanana/couchdb/blob/async_file_writes/gen_load.sh
> [7] -
> https://github.com/fdmanana/couchdb/blob/async_file_writes/src/couchdb/couch_internal_load_gen.erl