Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

Adam Kocoloski Tue, 19 Feb 2019 10:42:14 -0800

I do not think we should store the revtree as a blob. The design where each 
edit branch is its own KV should save on network IO and CPU cycles for normal 
updates. We’ve performed too many heroics to keep couch_key_tree from stalling 
entire databases when trying to update a single document with a wide revision 
tree, I would much prefer to ignore other edit branches entirely when all we’re 
doing is extending one of them.


I also do not think we should store JSON documents as blobs, but it’s a closer 
call. Some of my reasoning for preferring the exploded path design:

- it lends itself nicely to sub-document operations, for which Jan crafted an 
RFC last year: https://github.com/apache/couchdb/issues/1559
- it optimizes the creation of Mango indexes on existing databases since we 
only need to retrieve the value(s) we want to index
- it optimizes Mango queries that use field selectors
- anyone who wanted to try their hand at GraphQL will find it very handy: 
https://github.com/apache/couchdb/issues/1499
- looking further ahead, it lets us play with smarter leaf value types like 
Counters (yes I’m still on the CRDT bandwagon, sorry)

A few comments on the thread:

>>> * Most documents bodies are probably going to be smaller than 100k. So in
>>> the majority of case it would be one write / one read to update and fetch
>>> the document body.

We should test, but I expect reading 50KB of data in a range query is almost as 
efficient as reading a single 50 KB value. Similarly, writes to a contiguous 
set of keys should be quite efficient.

I am concerned about the overhead of the repeated field paths in the keys with 
the exploded path option in the absence of key prefix compression. That would 
be my main reason to acquiesce and throw away all the document structure.

Adam

> On Feb 19, 2019, at 12:04 PM, Robert Newson <[email protected]> wrote:
> 
> I like the idea that we'd reuse the same pattern (but perhaps not the same 
> _code_) for doc bodies, revtree and attachments.
> 
> I hope we still get to delete couch_key_tree.erl, though.
> 
> -- 
>  Robert Samuel Newson
>  [email protected]
> 
> On Tue, 19 Feb 2019, at 17:03, Jan Lehnardt wrote:
>> I like the idea from a “trying a simple thing first” perspective, but 
>> Nick’s points below are especially convincing to with this for now.
>> 
>> Best
>> Jan
>> —
>> 
>>> On 19. Feb 2019, at 17:53, Nick Vatamaniuc <[email protected]> wrote:
>>> 
>>> Hi,
>>> 
>>> Sorry for jumping in so late, I was following from the sidelines mostly. A
>>> lot of good discussion happening and am excited about the possibilities
>>> here.
>>> 
>>> I do like the simpler "chunking" approach for a few reasons:
>>> 
>>> * Most documents bodies are probably going to be smaller than 100k. So in
>>> the majority of case it would be one write / one read to update and fetch
>>> the document body.
>>> 
>>> * We could reuse the chunking code for attachment handling and possibly
>>> revision key trees. So it's the general pattern of upload chunks to some
>>> prefix, and when finished flip an atomic toggle to make it current.
>>> 
>>> * Do the same thing with revision trees and we could re-use the revision
>>> tree manipulation logic. That is, the key tree in most cases would be small
>>> enough to fit in 100k but if they get huge, they'd get chunked. This would
>>> allow us to reuse all the battle tested couch_key_tree code mostly as is.
>>> We even have property tests for it
>>> https://github.com/apache/couchdb/blob/master/src/couch/test/couch_key_tree_prop_tests.erl
>>> 
>>> * It removes the need to explain the max exploded path length limitation to
>>> customers.
>>> 
>>> Cheers,
>>> -Nick
>>> 
>>> 
>>> On Tue, Feb 19, 2019 at 11:18 AM Robert Newson <[email protected]> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> An alternative storage model that we should seriously consider is to
>>>> follow our current approach in couch_file et al. Specifically, that the
>>>> document _body_ is stored as an uninterpreted binary value. This would be
>>>> much like the obvious plan for attachment storage; a key prefix that
>>>> identifies the database and document, with the final item of that key tuple
>>>> is an incrementing integer. Each of those keys has a binary value of up to
>>>> 100k. Fetching all values with that key prefix, in fdb's natural ordering,
>>>> will yield the full document body, which can be JSON decoded for further
>>>> processing.
>>>> 
>>>> I like this idea, and I like Adam's original proposal to explode documents
>>>> into property paths. I have a slight preference for the simplicity of the
>>>> idea in the previous paragraph, not least because it's close to what we do
>>>> today. I also think it will be possible to migrate to alternative storage
>>>> models in future, and foundationdb's transaction supports means we can do
>>>> this migration seamlessly should we come to it.
>>>> 
>>>> I'm very interested in knowing if anyone else is interested in going this
>>>> simple, or considers it a wasted opportunity relative to the 'exploded'
>>>> path.
>>>> 
>>>> B.
>>>> 
>>>> --
>>>> Robert Samuel Newson
>>>> [email protected]
>>>> 
>>>> On Mon, 4 Feb 2019, at 19:59, Robert Newson wrote:
>>>>> I've been remiss here in not posting the data model ideas that IBM
>>>>> worked up while we were thinking about using FoundationDB so I'm posting
>>>>> it now. This is Adam' Kocoloski's original work, I am just transcribing
>>>>> it, and this is the context that the folks from the IBM side came in
>>>>> with, for full disclosure.
>>>>> 
>>>>> Basics
>>>>> 
>>>>> 1. All CouchDB databases are inside a Directory
>>>>> 2. Each CouchDB database is a Directory within that Directory
>>>>> 3. It's possible to list all subdirectories of a Directory, so
>>>>> `_all_dbs` is the list of directories from 1.
>>>>> 4. Each Directory representing a CouchdB database has several Subspaces;
>>>>> 4a. by_id/ doc subspace: actual document contents
>>>>> 4b. by_seq/versionstamp subspace: for the _changes feed
>>>>> 4c. index_definitions, indexes, ...
>>>>> 
>>>>> JSON Mapping
>>>>> 
>>>>> A hierarchical JSON object naturally maps to multiple KV pairs in FDB:
>>>>> 
>>>>> {
>>>>>   “_id”: “foo”,
>>>>>   “owner”: “bob”,
>>>>>   “mylist”: [1,3,5],
>>>>>   “mymap”: {
>>>>>       “blue”: “#0000FF”,
>>>>>       “red”: “#FF0000”
>>>>>   }
>>>>> }
>>>>> 
>>>>> maps to
>>>>> 
>>>>> (“foo”, “owner”) = “bob”
>>>>> (“foo”, “mylist”, 0) = 1
>>>>> (“foo”, “mylist”, 1) = 3
>>>>> (“foo”, “mylist”, 2) = 5
>>>>> (“foo”, “mymap”, “blue”) = “#0000FF”
>>>>> (“foo”, “mymap”, “red”) = “#FF0000”
>>>>> 
>>>>> NB: this means that the 100KB limit applies to individual leafs in the
>>>>> JSON object, not the entire doc
>>>>> 
>>>>> Edit Conflicts
>>>>> 
>>>>> We need to account for the presence of conflicts in various levels of
>>>>> the doc due to replication.
>>>>> 
>>>>> Proposal is to create a special value indicating that the subtree below
>>>>> our current cursor position is in an unresolvable conflict. Then add
>>>>> additional KV pairs below to describe the conflicting entries.
>>>>> 
>>>>> KV data model allows us to store these efficiently and minimize
>>>>> duplication of data:
>>>>> 
>>>>> A document with these two conflicts:
>>>>> 
>>>>> {
>>>>>   “_id”: “foo”,
>>>>>   “_rev”: “1-abc”,
>>>>>   “owner”: “alice”,
>>>>>   “active”: true
>>>>> }
>>>>> {
>>>>>   “_id”: “foo”,
>>>>>   “_rev”: “1-def”,
>>>>>   “owner”: “bob”,
>>>>>   “active”: true
>>>>> }
>>>>> 
>>>>> could be stored thus:
>>>>> 
>>>>> (“foo”, “active”) = true
>>>>> (“foo”, “owner”) = kCONFLICT
>>>>> (“foo”, “owner”, “1-abc”) = “alice”
>>>>> (“foo”, “owner”, “1-def”) = “bob”
>>>>> 
>>>>> So long as `kCONFLICT` is set at the top of the conflicting subtree this
>>>>> representation can handle conflicts of different data types as well.
>>>>> 
>>>>> Missing fields need to be handled explicitly:
>>>>> 
>>>>> {
>>>>> “_id”: “foo”,
>>>>> “_rev”: “1-abc”,
>>>>> “owner”: “alice”,
>>>>> “active”: true
>>>>> }
>>>>> 
>>>>> {
>>>>> “_id”: “foo”,
>>>>> “_rev”: “1-def”,
>>>>> “owner”: {
>>>>>   “name”: “bob”,
>>>>>   “email”: “
>>>>> [email protected]
>>>>> "
>>>>> }
>>>>> }
>>>>> 
>>>>> could be stored thus:
>>>>> 
>>>>> (“foo”, “active”) = kCONFLICT
>>>>> (“foo”, “active”, “1-abc”) = true
>>>>> (“foo”, “active”, “1-def”) = kMISSING
>>>>> (“foo”, “owner”) = kCONFLICT
>>>>> (“foo”, “owner”, “1-abc”) = “alice”
>>>>> (“foo”, “owner”, “1-def”, “name”) = “bob”
>>>>> (“foo”, “owner”, “1-def”, “email”) = ...
>>>>> 
>>>>> Revision Metadata
>>>>> 
>>>>> * CouchDB uses a hash history for revisions
>>>>> ** Each edit is identified by the hash of the content of the edit
>>>>> including the base revision against which it was applied
>>>>> ** Individual edit branches are bounded in length but the number of
>>>>> branches is potentially unbounded
>>>>> 
>>>>> * Size limits preclude us from storing the entire key tree as a single
>>>>> value; in pathological situations
>>>>> the tree could exceed 100KB (each entry is > 16 bytes)
>>>>> 
>>>>> * Store each edit branch as a separate KV including deleted status in a
>>>>> special subspace
>>>>> 
>>>>> * Structure key representation so that “winning” revision can be
>>>>> automatically retrieved in a limit=1
>>>>> key range operation
>>>>> 
>>>>> (“foo”, “_meta”, “deleted=false”, 1, “def”) = []
>>>>> (“foo”, “_meta”, “deleted=false”, 4, “bif”) = [“3-baz”,”2-bar”,”1-foo”]
>>>>> <-- winner
>>>>> (“foo”, “_meta”, “deleted=true”, 3, “abc”) = [“2-bar”, “1-foo”]
>>>>> 
>>>>> Changes Feed
>>>>> 
>>>>> * FDB supports a concept called a versionstamp — a 10 byte, unique,
>>>>> monotonically (but not sequentially) increasing value for each committed
>>>>> transaction. The first 8 bytes are the committed version of the
>>>>> database. The last 2 bytes are monotonic in the serialization order for
>>>>> transactions.
>>>>> 
>>>>> * A transaction can specify a particular index into a key where the
>>>>> following 10 bytes will be overwritten by the versionstamp at commit
>>>>> time
>>>>> 
>>>>> * A subspace keyed on versionstamp naturally yields a _changes feed
>>>>> 
>>>>> by_seq subspace
>>>>> (“versionstamp1”) = (“foo”, “1-abc”)
>>>>> (“versionstamp4”) = (“bar”, “4-def”)
>>>>> 
>>>>> by_id subspace
>>>>> (“bar”, “_vsn”) = “versionstamp4”
>>>>> ...
>>>>> (“foo”, “_vsn”) = “versionstamp1”
>>>>> 
>>>>> JSON Indexes
>>>>> 
>>>>> * “Mango” JSON indexes are defined by
>>>>> ** a list of field names, each of which may be nested,
>>>>> ** an optional partial_filter_selector which constrains the set of docs
>>>>> that contribute
>>>>> ** an optional name defined by the ddoc field (the name is auto-
>>>>> generated if not supplied)
>>>>> 
>>>>> * Store index definitions in a single subspace to aid query planning
>>>>> ** ((person,name), title, email) = (“name-title-email”, “{“student”:
>>>>> true}”)
>>>>> ** Store the values for each index in a dedicated subspace, adding the
>>>>> document ID as the last element in the tuple
>>>>> *** (“rosie revere”, “engineer”, “[email protected]", “foo”) = null
>>>>> 
>>>>> B.
>>>>> 
>>>>> --
>>>>> Robert Samuel Newson
>>>>> [email protected]
>>>>> 
>>>>> On Mon, 4 Feb 2019, at 19:13, Ilya Khlopotov wrote:
>>>>>> 
>>>>>> I want to fix previous mistakes. I did two mistakes in previous
>>>>>> calculations:
>>>>>> - I used 1Kb as base size for calculating expansion factor (although
>>>> we
>>>>>> don't know exact size of original document)
>>>>>> - The expansion factor calculation included number of revisions (it
>>>>>> shouldn't)
>>>>>> 
>>>>>> I'll focus on flattened JSON docs model
>>>>>> 
>>>>>> The following formula is used in previous calculation.
>>>>>> storage_size_per_document=mapping_table_size*number_of_revisions +
>>>>>> depth*number_of_paths*number_of_revisions +
>>>>>> number_of_paths*value_size*number_of_revisions
>>>>>> 
>>>>>> To clarify things a little bit I want to calculate space requirement
>>>> for
>>>>>> single revision this time.
>>>>>> mapping_table_size=field_name_size*(field_name_length+4(integer
>>>>>> size))=100 * (20 + 4(integer size)) = 2400 bytes
>>>>>> storage_size_per_document_per_revision_per_replica=mapping_table_size
>>>> +
>>>>>> depth*number_of_paths + value_size*number_of_paths =
>>>>>> 2400bytes + 10*1000+1000*100=112400bytes~=110 Kb
>>>>>> 
>>>>>> We definitely can reduce requirement for mapping table by adopting
>>>>>> rnewson's idea of a schema.
>>>>>> 
>>>>>> On 2019/02/04 11:08:16, Ilya Khlopotov <[email protected]> wrote:
>>>>>>> Hi Michael,
>>>>>>> 
>>>>>>>> For example, hears a crazy thought:
>>>>>>>> Map every distinct occurence of a key/value instance through a
>>>> crypto hash
>>>>>>>> function to get a set of hashes.
>>>>>>>> 
>>>>>>>> These can be be precomputed by Couch without any lookups in FDB.
>>>> These
>>>>>>>> will be spread all over kingdom come in FDB and not lend
>>>> themselves to
>>>>>>>> range search well.
>>>>>>>> 
>>>>>>>> So what you do is index them for frequency of occurring in the
>>>> same set.
>>>>>>>> In essence, you 'bucket them' statistically, and that bucket id
>>>> becomes a
>>>>>>>> key prefix. A crypto hash value can be copied into more than one
>>>> bucket.
>>>>>>>> The {bucket_id}/{cryptohash} becomes a {val_id}
>>>>>>> 
>>>>>>>> When writing a document, Couch submits the list/array of
>>>> cryptohash values
>>>>>>>> it computed to FDB and gets back the corresponding  {val_id} (the
>>>> id with
>>>>>>>> the bucket prefixed).  This can get somewhat expensive if there's
>>>> always a
>>>>>>>> lot of app local cache misses.
>>>>>>>> 
>>>>>>>> A document's value is then a series of {val_id} arrays up to 100k
>>>> per
>>>>>>>> segment.
>>>>>>>> 
>>>>>>>> When retrieving a document, you get the val_ids, find the distinct
>>>> buckets
>>>>>>>> and min/max entries for this doc, and then parallel query each
>>>> bucket while
>>>>>>>> reconstructing the document.
>>>>>>> 
>>>>>>> Interesting idea. Let's try to think it through to see if we can
>>>> make it viable.
>>>>>>> Let's go through hypothetical example. Input data for the example:
>>>>>>> - 1M of documents
>>>>>>> - each document is around 10Kb
>>>>>>> - each document consists of 1K of unique JSON paths
>>>>>>> - each document has 100 unique JSON field names
>>>>>>> - every scalar value is 100 bytes
>>>>>>> - 10% of unique JSON paths for every document already stored in
>>>> database under different doc or different revision of the current one
>>>>>>> - we assume 3 independent copies for every key-value pair in FDB
>>>>>>> - our hash key size is 32 bytes
>>>>>>> - let's assume we can determine if key is already on the storage
>>>> without doing query
>>>>>>> - 1% of paths is in cache (unrealistic value, in real live the
>>>> percentage is lower)
>>>>>>> - every JSON field name is 20 bytes
>>>>>>> - every JSON path is 10 levels deep
>>>>>>> - document key prefix length is 50
>>>>>>> - every document has 10 revisions
>>>>>>> Let's estimate the storage requirements and size of data we need to
>>>> transmit. The calculations are not exact.
>>>>>>> 1. storage_size_per_document (we cannot estimate exact numbers since
>>>> we don't know how FDB stores it)
>>>>>>> - 10 * ((10Kb - (10Kb * 10%)) + (1K - (1K * 10%)) * 32 bytes) =
>>>> 38Kb * 10 * 3 = 1140 Kb (11x)
>>>>>>> 2. number of independent keys to retrieve on document read
>>>> (non-range queries) per document
>>>>>>> - 1K - (1K * 1%) = 990
>>>>>>> 3. number of range queries: 0
>>>>>>> 4. data to transmit on read: (1K - (1K * 1%)) * (100 bytes + 32
>>>> bytes) = 102 Kb (10x)
>>>>>>> 5. read latency (we use 2ms per read based on numbers from
>>>> https://apple.github.io/foundationdb/performance.html)
>>>>>>>   - sequential: 990*2ms = 1980ms
>>>>>>>   - range: 0
>>>>>>> Let's compare these numbers with initial proposal (flattened JSON
>>>> docs without global schema and without cache)
>>>>>>> 1. storage_size_per_document
>>>>>>> - mapping table size: 100 * (20 + 4(integer size)) = 2400 bytes
>>>>>>> - key size: (10 * (4 + 1(delimiter))) + 50 = 100 bytes
>>>>>>> - storage_size_per_document: 2.4K*10 + 100*1K*10 + 1K*100*10 =
>>>> 2024K = 1976 Kb * 3 = 5930 Kb (59.3x)
>>>>>>> 2. number of independent keys to retrieve: 0-2 (depending on index
>>>> structure)
>>>>>>> 3. number of range queries: 1 (1001 of keys in result)
>>>>>>> 4. data to transmit on read: 24K + 1000*100 + 1000*100 = 23.6 Kb
>>>> (2.4x)
>>>>>>> 5. read latency (we use 2ms per read based on numbers from
>>>> https://apple.github.io/foundationdb/performance.html and estimate range
>>>> read performance based on numbers from
>>>> https://apple.github.io/foundationdb/benchmarking.html#single-core-read-test
>>>> )
>>>>>>> - range read performance: Given read performance is about 305,000
>>>> reads/second and range performance 3,600,000 keys/second we estimate range
>>>> performance to be 11.8x compared to read performance. If read performance
>>>> is 2ms than range performance is 0.169ms (which is hard to believe).
>>>>>>> - sequential: 2 * 2 = 4ms
>>>>>>> - range: 0.169
>>>>>>> 
>>>>>>> It looks like we are dealing with a tradeoff:
>>>>>>> - Map every distinct occurrence of a key/value instance through a
>>>> crypto hash:
>>>>>>> - 5.39x more disk space efficient
>>>>>>> - 474x slower
>>>>>>> - flattened JSON model
>>>>>>> - 5.39x less efficient in disk space
>>>>>>> - 474x faster
>>>>>>> 
>>>>>>> In any case this unscientific exercise was very helpful. Since it
>>>> uncovered the high cost in terms of disk space. 59.3x of original disk size
>>>> is too much IMO.
>>>>>>> 
>>>>>>> Are the any ways we can make Michael's model more performant?
>>>>>>> 
>>>>>>> Also I don't quite understand few aspects of the global hash table
>>>> proposal:
>>>>>>> 
>>>>>>> 1. > - Map every distinct occurence of a key/value instance through
>>>> a crypto hash function to get a set of hashes.
>>>>>>> I think we are talking only about scalar values here? I.e.
>>>> `"#/foo.bar.baz": 123`
>>>>>>> Since I don't know how we can make it work for all possible JSON
>>>> paths `{"foo": {"bar": {"size": 12, "baz": 123}}}":
>>>>>>> - foo
>>>>>>> - foo.bar
>>>>>>> - foo.bar.baz
>>>>>>> 
>>>>>>> 2. how to delete documents
>>>>>>> 
>>>>>>> Best regards,
>>>>>>> ILYA
>>>>>>> 
>>>>>>> 
>>>>>>> On 2019/01/30 23:33:22, Michael Fair <[email protected]>
>>>> wrote:
>>>>>>>> On Wed, Jan 30, 2019, 12:57 PM Adam Kocoloski <[email protected]
>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi Michael,
>>>>>>>>> 
>>>>>>>>>> The trivial fix is to use DOCID/REVISIONID as DOC_KEY.
>>>>>>>>> 
>>>>>>>>> Yes that’s definitely one way to address storage of edit
>>>> conflicts. I
>>>>>>>>> think there are other, more compact representations that we can
>>>> explore if
>>>>>>>>> we have this “exploded” data model where each scalar value maps
>>>> to an
>>>>>>>>> individual KV pair.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I agree, as I mentioned on the original thread, I see a scheme,
>>>> that
>>>>>>>> handles both conflicts and revisions, where you only have to store
>>>> the most
>>>>>>>> recent change to a field.  Like you suggested, multiple revisions
>>>> can share
>>>>>>>> a key.  Which in my mind's eye further begs the conflicts/revisions
>>>>>>>> discussion along with the working within the limits discussion
>>>> because it
>>>>>>>> seems to me they are all intrinsically related as a "feature".
>>>>>>>> 
>>>>>>>> Saying 'We'll break documents up into roughly 80k segments', then
>>>> trying to
>>>>>>>> overlay some kind of field sharing scheme for revisions/conflicts
>>>> doesn't
>>>>>>>> seem like it will work.
>>>>>>>> 
>>>>>>>> I probably should have left out the trivial fix proposal as I
>>>> don't think
>>>>>>>> it's a feasible solution to actually use.
>>>>>>>> 
>>>>>>>> The comment is more regarding that I do not see how this thread
>>>> can escape
>>>>>>>> including how to store/retrieve conflicts/revisions.
>>>>>>>> 
>>>>>>>> For instance, the 'doc as individual fields' proposal lends itself
>>>> to value
>>>>>>>> sharing across mutiple documents (and I don't just mean revisions
>>>> of the
>>>>>>>> same doc, I mean the same key/value instance could be shared for
>>>> every
>>>>>>>> document).
>>>>>>>> However that's not really relevant if we're not considering the
>>>> amount of
>>>>>>>> shared information across documents in the storage scheme.
>>>>>>>> 
>>>>>>>> Simply storing documents in <100k segments (perhaps in some kind of
>>>>>>>> compressed binary representation) to deal with that FDB limit
>>>> seems fine.
>>>>>>>> The only reason to consider doing something else is because of its
>>>> impact
>>>>>>>> to indexing, searches, reduce functions, revisions, on-disk size
>>>> impact,
>>>>>>>> etc.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>>> I'm assuming the process will flatten the key paths of the
>>>> document into
>>>>>>>>> an array and then request the value of each key as multiple
>>>> parallel
>>>>>>>>> queries against FDB at once
>>>>>>>>> 
>>>>>>>>> Ah, I think this is not one of Ilya’s assumptions. He’s trying
>>>> to design a
>>>>>>>>> model which allows the retrieval of a document with a single
>>>> range read,
>>>>>>>>> which is a good goal in my opinion.
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> I am not sure I agree.
>>>>>>>> 
>>>>>>>> Think of bitTorrent, a single range read should pull back the
>>>> structure of
>>>>>>>> the document (the pieces to fetch), but not necessarily the whole
>>>> document.
>>>>>>>> 
>>>>>>>> What if you already have a bunch of pieces in common with other
>>>> documents
>>>>>>>> locally (a repeated header/footer/ or type for example); and you
>>>> only need
>>>>>>>> to get a few pieces of data you don't already have?
>>>>>>>> 
>>>>>>>> The real goal to Couch I see is to treat your document set like the
>>>>>>>> collection of structured information that it is.  In some respects
>>>> like an
>>>>>>>> extension of your application's heap space for structured objects
>>>> and
>>>>>>>> efficiently querying that collection to get back subsets of the
>>>> data.
>>>>>>>> 
>>>>>>>> Otherwise it seems more like a slightly upgraded file system plus
>>>> a fancy
>>>>>>>> grep/find like feature...
>>>>>>>> 
>>>>>>>> The best way I see to unlock more features/power is to a move
>>>> towards a
>>>>>>>> more granular and efficient way to store and retrieve the scalar
>>>> values...
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> For example, hears a crazy thought:
>>>>>>>> Map every distinct occurence of a key/value instance through a
>>>> crypto hash
>>>>>>>> function to get a set of hashes.
>>>>>>>> 
>>>>>>>> These can be be precomputed by Couch without any lookups in FDB.
>>>> These
>>>>>>>> will be spread all over kingdom come in FDB and not lend
>>>> themselves to
>>>>>>>> range search well.
>>>>>>>> 
>>>>>>>> So what you do is index them for frequency of occurring in the
>>>> same set.
>>>>>>>> In essence, you 'bucket them' statistically, and that bucket id
>>>> becomes a
>>>>>>>> key prefix. A crypto hash value can be copied into more than one
>>>> bucket.
>>>>>>>> The {bucket_id}/{cryptohash} becomes a {val_id}
>>>>>>>> 
>>>>>>>> When writing a document, Couch submits the list/array of
>>>> cryptohash values
>>>>>>>> it computed to FDB and gets back the corresponding  {val_id} (the
>>>> id with
>>>>>>>> the bucket prefixed).  This can get somewhat expensive if there's
>>>> always a
>>>>>>>> lot of app local cache misses.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> A document's value is then a series of {val_id} arrays up to 100k
>>>> per
>>>>>>>> segment.
>>>>>>>> 
>>>>>>>> When retrieving a document, you get the val_ids, find the distinct
>>>> buckets
>>>>>>>> and min/max entries for this doc, and then parallel query each
>>>> bucket while
>>>>>>>> reconstructing the document.
>>>>>>>> 
>>>>>>>> The values returned from the buckets query are the key/value
>>>> strings
>>>>>>>> required to reassemble this document.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> ----------
>>>>>>>> I put this forward primarily to hilite the idea that trying to
>>>> match the
>>>>>>>> storage representation of documents in a straight forward way to
>>>> FDB keys
>>>>>>>> to reduce query count might not be the most performance oriented
>>>> approach.
>>>>>>>> 
>>>>>>>> I'd much prefer a storage approach that reduced data duplication
>>>> and
>>>>>>>> enabled fast sub-document queries.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> This clearly falls in the realm of what people want the 'use case'
>>>> of Couch
>>>>>>>> to be/become.  By giving Couch more access to sub-document
>>>> queries, I could
>>>>>>>> eventually see queries as complicated as GraphQL submitted to
>>>> Couch and
>>>>>>>> pulling back ad-hoc aggregated data across multiple documents in a
>>>> single
>>>>>>>> application layer request.
>>>>>>>> 
>>>>>>>> Hehe - one way to look at the database of Couch documents is that
>>>> they are
>>>>>>>> all conflict revisions of the single root empty document.   What I
>>>> mean be
>>>>>>>> this is consider thinking of the entire document store as one
>>>> giant DAG of
>>>>>>>> key/value pairs. How even separate documents are still typically
>>>> related to
>>>>>>>> each other.  For most applications there is a tremendous amount of
>>>> data
>>>>>>>> redundancy between docs and especially between revisions of those
>>>> docs...
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> And all this is a long way of saying "I think there could be a lot
>>>> of value
>>>>>>>> in assuming documents are 'assembled' from multiple queries to
>>>> FDB, with
>>>>>>>> local caching, instead of simply retrieved"
>>>>>>>> 
>>>>>>>> Thanks, I hope I'm not the only outlier here thinking this way!?
>>>>>>>> 
>>>>>>>> Mike :-)
>>>>>>>> 
>>>>>>> 
>>>> 
>>

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

Reply via email to