This is interesting work, I notice some substantial changes to couch_btree, a new query_modify_raw, etc..
I'm wondering though if we'd be better off to base these changes on the re factored version of couch_btree that davisp has[1]. I haven't looked at it too closely or tested with it but if I recall the goal was first to achieve a more readable version with identical semantics so that we could then move forward with improvements. [1] https://github.com/davisp/couchdb/commit/37c1c9b4b90f6c0f3c22b75dfb2ae55c8b708ab1 On Jun 24, 2011, at 6:06 AM, Filipe David Manana wrote: > Thanks Adam. > > Don't get too scared :) Ignore the commit history and just look at > github's "Files changed" tab, the modification summary is: > > "Showing 19 changed files with 730 additions and 402 deletions." > > More than half of those commits were merges with trunk, many snappy > refactorings (before it was added to trunk) and other experiments that > were reverted after. > We'll try to break this into 2 or 3 patches. > > So the single patch is something relatively small: > https://github.com/fdmanana/couchdb/compare/async_file_writes_no_test.diff > > On Fri, Jun 24, 2011 at 4:05 AM, Adam Kocoloski <[email protected]> wrote: >> Hi Damien, I'd like to see these 220 commits rebased into a set of logical >> patches against trunk. It'll make the review easier and will help future >> devs track down any bugs that are introduced. Best, >> >> Adam >> >> On Jun 23, 2011, at 6:49 PM, Damien Katz wrote: >> >>> Hi everyone, >>> >>> As it’s known by many of you, Filipe and I have been working on improving >>> performance, specially write performance [1]. This work has been public in >>> the Couchbase github account since the beginning, and the non Couchbase >>> specific changes are now isolated in [2] and [3]. >>> In [3] there’s an Erlang module that is used to test the performance when >>> writing and updating batches of documents with concurrency, which was used, >>> amongst other tools, to measure the performance gains. This module bypasses >>> the network stack and the JSON parsing, so that basically it allows us to >>> see more easily how significant the changes in couch_file, couch_db and >>> couch_db_updater are. >>> >>> The main and most important change is asynchronous writes. The file module >>> no longer blocks callers until the write calls complete. Instead they >>> immediately reply to the caller with the position in the file where the >>> data is going to be written to. The data is then sent to a dedicated loop >>> process that is continuously writing the data it receives, from the >>> couch_file gen_server, to disk (and batching when possible). This allows >>> callers (such as the db updater for.e.g.) to issue write calls and keep >>> doing other work (preparing documents, etc) while the writes are being done >>> in parallel. After issuing all the writes, callers simply call the new >>> ‘flush’ function in the couch_file gen_server, which will block the caller >>> until everything was effectively written to disk - normally this flush call >>> ends up not blocking the caller or it blocks it for a very small period. >>> >>> There are other changes such as avoiding 2 btree lookups per document ID >>> (COUCHDB-1084 [4]), faster sorting in the updater (O(n log n) vs O(n^2)) >>> and avoid sorting already sorted lists in the updater. >>> >>> Checking if attachments are compressible was also moved into a new >>> module/process. We verified this took much CPU time when all or most of the >>> documents to write/update have attachments - building the regexps and >>> matching against them for every single attachment is surprisingly expensive. >>> >>> There’s also a new couch_db:update_doc/s flag named ‘optimistic’ which >>> basically changes the behaviour to write the document bodies before >>> entering the updater and skip some attachment related checks (duplicated >>> names for e.g.). This flag is not yet exposed to the HTTP api, but it could >>> be via an X-Optimistic-Write header in the doc PUT/POST requests and >>> _bulk_docs for e.g. We’ve seen this as good when the client knows that the >>> documents to write don’t exist yet in the database and we aren’t already IO >>> bound, such as when SSDs are used. >>> >>> We used relaximation, Filipe’s basho bench based tests [5] and the Erlang >>> test module mentioned before [6, 7], exposed via the HTTP . Here follow >>> some benchmark results. >>> >>> >>> # Using the Erlang test module (test output) >>> >>> ## 1Kb documents, 10 concurrent writers, batches of 500 docs >>> >>> trunk before snappy was added: >>> >>> {"db":"load_test","total":100000,"batch":500,"concurrency":10,"rounds":10,"delayed_commits":false,"optimistic":false,"total_time_ms":270071} >>> >>> trunk: >>> >>> {"db":"load_test","total":100000,"batch":500,"concurrency":10,"rounds":10,"delayed_commits":false,"optimistic":false,"total_time_ms":157328} >>> >>> trunk + async writes (and snappy): >>> >>> {"db":"load_test","total":100000,"batch":500,"concurrency":10,"rounds":10,"delayed_commits":false,"optimistic":false,"total_time_ms":121518} >>> >>> ## 2.5Kb documents, 10 concurrent writers, batches of 500 docs >>> >>> trunk before snappy was added: >>> >>> {"db":"load_test","total":100000,"batch":500,"concurrency":10,"rounds":10,"delayed_commits":false,"optimistic":false,"total_time_ms":507098} >>> >>> trunk: >>> >>> {"db":"load_test","total":100000,"batch":500,"concurrency":10,"rounds":10,"delayed_commits":false,"optimistic":false,"total_time_ms":230391} >>> >>> trunk + async writes (and snappy): >>> >>> {"db":"load_test","total":100000,"batch":500,"concurrency":10,"rounds":10,"delayed_commits":false,"optimistic":false,"total_time_ms":190151} >>> >>> >>> # bash bench tests, via the public HTTP APIs >>> >>> ## batches of 1 1Kb docs, 50 writers, 5 minutes run >>> >>> trunk: 147 702 docs written >>> branch: 149 534 docs written >>> >>> ## batches of 10 1Kb docs, 50 writers, 5 minutes run >>> >>> trunk: 878 520 docs written >>> branch: 991 330 docs written >>> >>> ## batches of 100 1Kb docs, 50 writers, 5 minutes run >>> >>> trunk: 1 627 600 docs written >>> branch: 1 865 800 docs written >>> >>> ## batches of 1 2.5Kb docs, 50 writers, 5 minutes run >>> >>> trunk: 142 531 docs written >>> branch: 143 012 docs written >>> >>> ## batches of 10 2.5Kb docs, 50 writers, 5 minutes run >>> >>> trunk: 724 880 docs written >>> branch: 780 690 docs written >>> >>> ## batches of 100 2.5Kb docs, 50 writers, 5 minutes run >>> >>> trunk: 1 028 600 docs written >>> branch: 1 152 800 docs written >>> >>> >>> # bash bench tests, via the internal Erlang APIs >>> ## batches of 100 2.5Kb docs, 50 writers, 5 minutes run >>> >>> trunk: 3 170 100 docs written >>> branch: 3 359 900 docs written >>> >>> >>> # Relaximation tests >>> >>> 1Kb docs: >>> >>> http://graphs.mikeal.couchone.com/#/graph/4843dbdf8fa104783870094b83002a1a >>> >>> 2.5Kb docs: >>> >>> http://graphs.mikeal.couchone.com/#/graph/4843dbdf8fa104783870094b830022c0 >>> >>> 4Kb docs: >>> >>> http://graphs.mikeal.couchone.com/#/graph/4843dbdf8fa104783870094b8300330d >>> >>> >>> All the documents used for these tests can be found at: >>> https://github.com/fdmanana/basho_bench_couch/tree/master/couch_docs >>> >>> >>> Now some view indexing tests. >>> >>> # indexer_test_2 database >>> (http://fdmanana.couchone.com/_utils/database.html?indexer_test_2) >>> >>> ## trunk >>> >>> $ time curl >>> http://localhost:5984/indexer_test_2/_design/test/_view/view1?limit=1 >>> {"total_rows":1102400,"offset":0,"rows":[ >>> {"id":"00d49881-7bcf-4c3d-a65d-e44435eeb513","key":["dwarf","assassin",2,1.1],"value":[{"x":174347.18,"y":127272.8},{"x":35179.93,"y":41550.55},{"x":157014.38,"y":172052.63},{"x":116185.83,"y":69871.73},{"x":153746.28,"y":190006.59}]} >>> ]} >>> >>> real 20m51.388s >>> user 0m0.040s >>> sys 0m0.000s >>> >>> >>> ## branch async writes >>> >>> $ time curl >>> http://localhost:5984/indexer_test_2/_design/test/_view/view1?limit=1 >>> {"total_rows":1102400,"offset":0,"rows":[ >>> {"id":"00d49881-7bcf-4c3d-a65d-e44435eeb513","key":["dwarf","assassin",2,1.1],"value":[{"x":174347.18,"y":127272.8},{"x":35179.93,"y":41550.55},{"x":157014.38,"y":172052.63},{"x":116185.83,"y":69871.73},{"x":153746.28,"y":190006.59}]} >>> ]} >>> >>> real 15m17.908s >>> user 0m0.008s >>> sys 0m0.020s >>> >>> >>> # indexer_test_3_database >>> (http://fdmanana.couchone.com/_utils/database.html?indexer_test_3) >>> >>> ## trunk >>> >>> $ time curl >>> http://localhost:5984/indexer_test_3/_design/test/_view/view1?limit=1 >>> {"total_rows":1102400,"offset":0,"rows":[ >>> {"id":"00d49881-7bcf-4c3d-a65d-e44435eeb513","key":["dwarf","assassin",2,1.1],"value":[{"x":174347.18,"y":127272.8},{"x":35179.93,"y":41550.55},{"x":157014.38,"y":172052.63},{"x":116185.83,"y":69871.73},{"x":153746.28,"y":190006.59}]} >>> ]} >>> >>> real 21m17.346s >>> user 0m0.012s >>> sys 0m0.028s >>> >>> ## branch async writes >>> >>> $ time curl >>> http://localhost:5984/indexer_test_3/_design/test/_view/view1?limit=1 >>> {"total_rows":1102400,"offset":0,"rows":[ >>> {"id":"00d49881-7bcf-4c3d-a65d-e44435eeb513","key":["dwarf","assassin",2,1.1],"value":[{"x":174347.18,"y":127272.8},{"x":35179.93,"y":41550.55},{"x":157014.38,"y":172052.63},{"x":116185.83,"y":69871.73},{"x":153746.28,"y":190006.59}]} >>> ]} >>> >>> real 16m28.558s >>> user 0m0.012s >>> sys 0m0.020s >>> >>> We don’t show nearly as big of improvements for single write per request >>> benchmarks as we do with bulk writes. This is due to the HTTP request >>> overhead and our own inefficiencies at that layer. We have lots of room yet >>> for optimizations at the networking layer. >>> >>> We'd like to merge this code into trunk next week by next wednesday. Please >>> respond with any improvement, objections or comments by then. Thanks! >>> >>> -Damien >>> >>> >>> [1] - >>> http://blog.couchbase.com/driving-performance-improvements-couchbase-single-server-two-dot-zero >>> [2] - https://github.com/fdmanana/couchdb/compare/async_file_writes_no_test >>> [3] - https://github.com/fdmanana/couchdb/compare/async_file_writes >>> [4] - https://issues.apache.org/jira/browse/COUCHDB-1084 >>> [5] - https://github.com/fdmanana/basho_bench_couch >>> [6] - https://github.com/fdmanana/couchdb/blob/async_file_writes/gen_load.sh >>> [7] - >>> https://github.com/fdmanana/couchdb/blob/async_file_writes/src/couchdb/couch_internal_load_gen.erl >> >> > > > > -- > Filipe David Manana, > [email protected], [email protected] > > "Reasonable men adapt themselves to the world. > Unreasonable men adapt the world to themselves. > That's why all progress depends on unreasonable men."
