Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

Michael Fair Wed, 30 Jan 2019 12:19:36 -0800

On Wed, Jan 30, 2019 at 9:53 AM Ilya Khlopotov <iil...@apache.org> wrote:


> FoundationDB Records layer uses global schema for JSON documents. They
> also have a nice way of creating indexes and schema evolution support.
> However this support comes at a cost of extra lookups in different
> subspace. With local mapping table we almost (except a corner case) certain
> that the schema and JSON fields would be collocated on a single node. Due
> to common prefix.
>

In general I think I prefer the global, but separate, key mapping idea and
use FDB's "cache the important, frequently accessed data, across
distributed memory" features.

I think I prefer the idea of indexing all document's keys using the same
identifier set.  In general I think applications have the behavior that
some keys are referenced far more than other keys and giving those keys in
each document the same value I think could eventually prove useful for
making many features faster and easier than expected.

While I really like the independence and locality of a document local
mapping, when I think about the process of transforming a document's keys
into that mapping's values, I don't see a particular advantage regarding
where in the DB that key mapping came from.  I'm assuming the process will
flatten the key paths of the document into an array and then request the
value of each key as multiple parallel queries against FDB at once.  I
think regardless of whether the mapping is document local or global, having
FDB return those individual values is faster/easier than having Couch Range
fetch the mapping and do the translation work itself.

I could even see some periodic "reorganizing" engine that could renumber
frequently used keys to make the reverse transformation back into a value
that much faster.


> > Personally I wonder if the 10KB limit on field paths is anything more
> than a theoretical concern. It’s hard for me to imagine a useful schema
> that would get anywhere near that deep, but maybe I’m insufficiently
> creative :)


+1


There’s certainly a storage overhead from repeating the upper portion of a
> path over and over again, but that’s also something the storage engine can
> optimize away through prefix elision. The current production storage engine
> in FoundationDB does not do this elision, but the new one in development
> does.
>

Assuming it only does "prefix" and not "segment", then I don't think this
will help because the DOCID for each key in JSON_PATH will be different,
making the "prefix" to each path across different documents distinct.  The
prefix matching engine will only be able to match up to the key element
before the DOCID.

Does/Could/Would the engine allow an app to use FDB itself to create a
mapping identifier for key "segments" or some other method to "skip past"
the distinct parts of keys to in a sense "reroot" the search?

If FDB was to "bake in" this "key segment mapping" idea as something it
exposed to the application layer; that'd be awesome!  Lots of applications
could probably make use of that.

Mike

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

Reply via email to