Storing documents bodies as raw JSON binaries instead of serialized JSON terms
------------------------------------------------------------------------------

                 Key: COUCHDB-1092
                 URL: https://issues.apache.org/jira/browse/COUCHDB-1092
             Project: CouchDB
          Issue Type: Improvement
          Components: Database Core
            Reporter: Filipe Manana
            Assignee: Filipe Manana


Currently we store documents as Erlang serialized (via the term_to_binary/1 
BIF) EJSON.
The proposed patch changes the database file format so that instead of storing 
serialized
EJSON document bodies, it stores raw JSON binaries.

The github branch is at:  https://github.com/fdmanana/couchdb/tree/raw_json_docs

Advantages:

* what we write to disk is much smaller - a raw JSON binary can easily get up 
to 50% smaller
  (at least according to the tests I did)

* when serving documents to a client we no longer need to JSON encode the 
document body
  read from the disk - this applies to individual document requests, view 
queries with
  ?include_docs=true, pull and push replications, and possibly other use cases.
  We just grab its body and prepend the _id, _rev and all the necessary 
metadata fields
  (this is via simple Erlang binary operations)

* we avoid the EJSON term copying between request handlers and the db updater 
processes,
  between the work queues and the view updater process, between replicator 
processes, etc

* before sending a document to the JavaScript view server, we no longer need to 
convert it
  from EJSON to JSON

The changes done to the document write workflow are minimalist - after JSON 
decoding the
document's JSON into EJSON and removing the metadata top level fields (_id, 
_rev, etc), it
JSON encodes the resulting EJSON body into a binary - this consumes CPU of 
course but it
brings 2 advantages:

1) we avoid the EJSON copy between the request process and the database updater 
process -
   for any realistic document size (4kb or more) this can be very expensive, 
specially
   when there are many nested structures (lists inside objects inside lists, 
etc)

2) before writing anything to the file, we do a term_to_binary([Len, Md5, 
TheThingToWrite])
   and then write the result to the file. A term_to_binary call with a binary 
as the input
   is very fast compared to a term_to_binary call with EJSON as input (or some 
other nested
   structure)

I think both compensate the JSON encoding after the separation of meta data 
fields and non-meta data fields.

The following relaximation graph, for documents with sizes of 4Kb, shows a 
significant
performance increase both for writes and reads - especially reads.   

http://graphs.mikeal.couchone.com/#/graph/698bf36b6c64dbd19aa2bef63400b94f


I've also made a few tests to see how much the improvement is when querying a 
view, for the
first time, without ?stale=ok. The size difference of the databases (after 
compaction) is
also very significant - this change can reduce the size at least 50% in common 
cases.

The test databases were created in an instance built from that experimental 
branch.
Then they were replicated into a CouchDB instance built from the current trunk.
At the end both databases were compacted (to fairly compare their final sizes).

The databases contain the following view:
{
    "_id": "_design/test",
    "language": "javascript",
    "views": {
        "simple": {
            "map": "function(doc) { emit(doc.float1, doc.strings[1]); }"
        }
    }
}


## Database with 500 000 docs of 2.5Kb each

Document template is at:  
https://github.com/fdmanana/couchdb/blob/raw_json_docs/doc_2_5k.json

Sizes (branch vs trunk):

$ du -m couchdb/tmp/lib/disk_json_test.couch 
1996    couchdb/tmp/lib/disk_json_test.couch

$ du -m couchdb-trunk/tmp/lib/disk_ejson_test.couch 
2693    couchdb-trunk/tmp/lib/disk_ejson_test.couch


Time, from a user's perpective, to build the view index from scratch:

$ time curl 
http://localhost:5984/disk_json_test/_design/test/_view/simple?limit=1
{"total_rows":500000,"offset":0,"rows":[
{"id":"0000076a-c1ae-4999-b508-c03f4d0620c5","key":null,"value":"wfxuF3N8XEK6"}
]}

real    6m6.740s
user    0m0.016s
sys     0m0.008s

$ time curl 
http://localhost:5985/disk_ejson_test/_design/test/_view/simple?limit=1
{"total_rows":500000,"offset":0,"rows":[
{"id":"0000076a-c1ae-4999-b508-c03f4d0620c5","key":null,"value":"wfxuF3N8XEK6"}
]}

real    15m41.439s
user    0m0.012s
sys     0m0.012s



## Database with 100 000 docs of 11Kb each

Document template is at:  
https://github.com/fdmanana/couchdb/blob/raw_json_docs/doc_11k.json

Sizes (branch vs trunk):

$ du -m couchdb/tmp/lib/disk_json_test_11kb.couch
1185    couchdb/tmp/lib/disk_json_test_11kb.couch

$ du -m couchdb-trunk/tmp/lib/disk_ejson_test_11kb.couch
2202    couchdb-trunk/tmp/lib/disk_ejson_test_11kb.couch


Time, from a user's perpective, to build the view index from scratch:

$ time curl 
http://localhost:5984/disk_json_test_11kb/_design/test/_view/simple?limit=1
{"total_rows":100000,"offset":0,"rows":[
{"id":"00001511-831c-41ff-9753-02861bff73b3","key":null,"value":"2fQUbzRUax4A"}
]}

real    4m19.306s
user    0m0.008s
sys     0m0.004s

$ time curl 
http://localhost:5985/disk_ejson_test_11kb/_design/test/_view/simple?limit=1
{"total_rows":100000,"offset":0,"rows":[
{"id":"00001511-831c-41ff-9753-02861bff73b3","key":null,"value":"2fQUbzRUax4A"}
]}

real    18m46.051s
user    0m0.008s
sys     0m0.016s



All in all, I haven't seen yet any disadvantage with this approach. Also, the 
code changes
don't bring additional complexity. I say the performance and disk space gains 
it gives are
very positive.

This branch still needs to be polished in a few places. But I think it isn't 
far from getting mature.

Other experiments that can be done are to store view values as raw JSON 
binaries as well (instead of EJSON)
and optional compression of the stored JSON binaries (since it's pure text, the 
compression ratio is very high).
However, I would prefer to do these other 2 suggestions in separate 
branches/patches - I haven't actually tested
any of them yet, so maybe they not bring significant gains.

Thoughts? :)


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to