> On 30. Jan 2019, at 14:22, Jan Lehnardt <j...@apache.org> wrote: > > Thanks Ilya for getting this started! > > Two quick notes on this one: > > 1. note that JSON does not guarantee object key order and that CouchDB has > never guaranteed it either, and with say emit(doc.foo, doc.bar), if either > emit() parameter was an object, the undefined-sort-order of SpiderMonkey > would mix things up. While worth bringing up, this is not a BC break. > > 2. This would have the fun property of being able to rename a key inside all > docs that have that key.
…in one short operation. Best Jan — > > Best > Jan > — > >> On 30. Jan 2019, at 14:05, Ilya Khlopotov <iil...@apache.org> wrote: >> >> # First proposal >> >> In order to overcome FoudationDB limitations on key size (10 kB) and value >> size (100 kB) we could use the following approach. >> >> Bellow the paths are using slash for illustration purposes only. We can use >> nested subspaces, tuples, directories or something else. >> >> - Store documents in a subspace or directory (to keep prefix for a key >> short) >> - When we store the document we would enumerate all field names (0 and 1 are >> reserved) and store the mapping table in the key which look like: >> ``` >> {DB_DOCS_NS} / {DOC_KEY} / 0 >> ``` >> - Flatten the JSON document (convert it into key value pairs where the key >> is `JSON_PATH` and value is `SCALAR_VALUE`) >> - Replace elements of JSON_PATH with integers from mapping table we >> constructed earlier >> - When we have array use `1 / {array_idx}` >> - Store scalar values in the keys which look like the following (we use >> `JSON_PATH` with integers). >> ``` >> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH} >> ``` >> - If the scalar value exceeds 100kB we would split it and store every part >> under key constructed as: >> ``` >> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH} / {PART_IDX} >> ``` >> >> Since all parts of the documents are stored under a common `{DB_DOCS_NS} / >> {DOC_KEY}` they will be stored on the same server most of the time. The >> document can be retrieved by using range query (`txn.get_range("{DB_DOCS_NS} >> / {DOC_KEY} / 0", "{DB_DOCS_NS} / {DOC_KEY} / 0xFF")`). We can reconstruct >> the document since the mapping is returned as well. >> >> The downside of this approach is we wouldn't be able to ensure the same >> order of keys in the JSON object. Currently the `jiffy` JSON encoder >> respects order of keys. >> ``` >> 4> jiffy:encode({[{bbb, 1}, {aaa, 12}]}). >> <<"{\"bbb\":1,\"aaa\":12}">> >> 5> jiffy:encode({[{aaa, 12}, {bbb, 1}]}). >> <<"{\"aaa\":12,\"bbb\":1}">> >> ``` >> >> Best regards, >> iilyak >> >> On 2019/01/30 13:02:57, Ilya Khlopotov <iil...@apache.org> wrote: >>> As you might already know the FoundationDB has a number of limitations >>> which influences the way we might store JSON documents. The limitations are: >>> >>> | limitation |recommended value|recommended max|absolute >>> max| >>> |-------------------------|----------------------:|--------------------:|--------------:| >>> | transaction duration | | >>> | 5 sec | >>> | transaction data size | | >>> | 10 Mb | >>> | key size | 32 bytes | >>> 1 kB | 10 kB | >>> | value size | | >>> 10 kB | 100 kB | >>> >>> In order to fit the JSON document into 100kB we would have to partition it >>> in some way. There are three ways of partitioning the document >>> 1. store multiple binary blobs (parts) in different keys >>> 2. flatten JSON structure and store every path leading to a scalar value >>> under own key >>> 3. measure the size of different branches of a tree representing the JSON >>> document (while we parse) and use another key for the branch when we about >>> to exceed the limit >>> >>> - The first approach is the simplest but it wouldn't allow us to access >>> parts of the document. >>> - The downsides of a second approach are: >>> - flattened JSON structure would have long paths which means longer keys >>> - the scalar value cannot be more than 100kb (unless we split it as well) >>> - Third approach falls short in cases when the structure of the document >>> doesn't allow a clean cut off branches: >>> - complex rules to handle all corner cases >>> >>> The goals of this thread are: >>> - to collect ideas on how to encode and store the JSON document >>> - to comment on the collected ideas >>> >>> Non goals: >>> - the storage of metadata for the document would be discussed elsewhere >>> - thumb stones >>> - edit conflicts >>> - revisions >>> >>> Best regards, >>> iilyak >>> > > -- > Professional Support for Apache CouchDB: > https://neighbourhood.ie/couchdb-support/ > -- Professional Support for Apache CouchDB: https://neighbourhood.ie/couchdb-support/