Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

Ilya Khlopotov Wed, 30 Jan 2019 13:09:09 -0800

> I think I prefer the idea of indexing all document's keys using the same
> identifier set.  In general I think applications have the behavior that
> some keys are referenced far more than other keys and giving those keys in
> each document the same value I think could eventually prove useful for
> making many features faster and easier than expected.


This approach would require an invention of schema evolution features similar 
to recently open sourced Record Layer 
https://www.foundationdb.org/files/record-layer-paper.pdf
I am sure some CouchDB users do (because CouchDB is NoSQL i.e. schema-less 
database):
- rename fields
- reuse field names for something else when they update application
- remove fields
- have documents of different structure in one database

> I think regardless of whether the mapping is document local or global, having
> FDB return those individual values is faster/easier than having Couch Range
> fetch the mapping and do the translation work itself.
in case of global mapping we would do
- get_schema from different subspace (i.e. contact different nodes)
- extract all scalar values by issuing FDB's range query (most likely all 
values are co-located)
- stitch document together and return it to user

in case of local mapping we don't need to call get_schema. The schema would be 
returned by range query.

We would have to stitch document in either case.

Can you elaborate if my understanding is not correct (I didn't quite understand 
the "Couch Range fetch" part of your question)?

best regards,
iilyak

On 2019/01/30 20:11:18, Michael Fair <[email protected]> wrote: 
> On Wed, Jan 30, 2019 at 9:53 AM Ilya Khlopotov <[email protected]> wrote:
> 
> > FoundationDB Records layer uses global schema for JSON documents. They
> > also have a nice way of creating indexes and schema evolution support.
> > However this support comes at a cost of extra lookups in different
> > subspace. With local mapping table we almost (except a corner case) certain
> > that the schema and JSON fields would be collocated on a single node. Due
> > to common prefix.
> >
> 
> In general I think I prefer the global, but separate, key mapping idea and
> use FDB's "cache the important, frequently accessed data, across
> distributed memory" features.
> 
> I think I prefer the idea of indexing all document's keys using the same
> identifier set.  In general I think applications have the behavior that
> some keys are referenced far more than other keys and giving those keys in
> each document the same value I think could eventually prove useful for
> making many features faster and easier than expected.
> 
> While I really like the independence and locality of a document local
> mapping, when I think about the process of transforming a document's keys
> into that mapping's values, I don't see a particular advantage regarding
> where in the DB that key mapping came from.  I'm assuming the process will
> flatten the key paths of the document into an array and then request the
> value of each key as multiple parallel queries against FDB at once.  I
> think regardless of whether the mapping is document local or global, having
> FDB return those individual values is faster/easier than having Couch Range
> fetch the mapping and do the translation work itself.
> 
> I could even see some periodic "reorganizing" engine that could renumber
> frequently used keys to make the reverse transformation back into a value
> that much faster.
> 
> 
> > > Personally I wonder if the 10KB limit on field paths is anything more
> > than a theoretical concern. It’s hard for me to imagine a useful schema
> > that would get anywhere near that deep, but maybe I’m insufficiently
> > creative :)
> 
> 
> +1
> 
> 
> There’s certainly a storage overhead from repeating the upper portion of a
> > path over and over again, but that’s also something the storage engine can
> > optimize away through prefix elision. The current production storage engine
> > in FoundationDB does not do this elision, but the new one in development
> > does.
> >
> 
> Assuming it only does "prefix" and not "segment", then I don't think this
> will help because the DOCID for each key in JSON_PATH will be different,
> making the "prefix" to each path across different documents distinct.  The
> prefix matching engine will only be able to match up to the key element
> before the DOCID.
> 
> Does/Could/Would the engine allow an app to use FDB itself to create a
> mapping identifier for key "segments" or some other method to "skip past"
> the distinct parts of keys to in a sense "reroot" the search?
> 
> If FDB was to "bake in" this "key segment mapping" idea as something it
> exposed to the application layer; that'd be awesome!  Lots of applications
> could probably make use of that.
> 
> Mike
>

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

Reply via email to