RE: [DISCUSS] : things we need to solve/decide : storing JSON documents

Reddy B . Fri, 01 Feb 2019 01:13:03 -0800

By the way, if the FDB migration was to happen, will CouchDb continue to be a 
schema-less database where we can just drop our documents and map/reduce them 
without further ceremony?


I mean for the long-term, is there a commitment to keeping this feature? This 
is a big deal, the basics of CouchDb. I think this is the first assumption you 
make when you use CouchDb as of today.

I'm not trying to add toxicity to this very positive, constructive and high 
quality discussion, but just some humble feedback. As a user, when I see this 
being questioned, along with the other limitations introduced by FDB I am 
starting to wonder if rebasing is not just a politically correct way of saying 
that CouchDb is being retired. For many once core features now become optional 
extensions to be implemented.

Which makes me wonder "what's the core" and question the benefit/cost analysis 
of the switch in light of the current vision of the project. For it's starting 
to look like FDB may not only be used as an implementation convenience but as a 
new vision for CouchDb (deprecating the former vision). In light of this the 
benefit-cost analysis would make sense but such a change in vision has not been 
publicly announced.

And this would mean that today's core feature are likely to go the way of 
Couchapps tomorrow if the vision has indeed changed. This is a very problematic 
uncertainty as an end-user thinking long-term support for new projects. I 
totally appreciate that this is dev mailing list where ideas are bounced and 
technical details worked out, but it's important for us as users to see 
commitments on vision, thus my question. I also took advantage of this 
opportunity to voice the more general concern aforementioned.

But the specific question is: what's the vision for "schema-less" usage of 
CouchDb.

Thanks



________________________________
De : Ilya Khlopotov <iil...@apache.org>
Envoyé : mercredi 30 janvier 2019 22:08
À : dev@couchdb.apache.org
Objet : Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

> I think I prefer the idea of indexing all document's keys using the same
> identifier set.  In general I think applications have the behavior that
> some keys are referenced far more than other keys and giving those keys in
> each document the same value I think could eventually prove useful for
> making many features faster and easier than expected.

This approach would require an invention of schema evolution features similar 
to recently open sourced Record Layer 
https://www.foundationdb.org/files/record-layer-paper.pdf
I am sure some CouchDB users do (because CouchDB is NoSQL i.e. schema-less 
database):
- rename fields
- reuse field names for something else when they update application
- remove fields
- have documents of different structure in one database

> I think regardless of whether the mapping is document local or global, having
> FDB return those individual values is faster/easier than having Couch Range
> fetch the mapping and do the translation work itself.
in case of global mapping we would do
- get_schema from different subspace (i.e. contact different nodes)
- extract all scalar values by issuing FDB's range query (most likely all 
values are co-located)
- stitch document together and return it to user

in case of local mapping we don't need to call get_schema. The schema would be 
returned by range query.

We would have to stitch document in either case.

Can you elaborate if my understanding is not correct (I didn't quite understand 
the "Couch Range fetch" part of your question)?

best regards,
iilyak

On 2019/01/30 20:11:18, Michael Fair <mich...@daclubhouse.net> wrote:
> On Wed, Jan 30, 2019 at 9:53 AM Ilya Khlopotov <iil...@apache.org> wrote:
>
> > FoundationDB Records layer uses global schema for JSON documents. They
> > also have a nice way of creating indexes and schema evolution support.
> > However this support comes at a cost of extra lookups in different
> > subspace. With local mapping table we almost (except a corner case) certain
> > that the schema and JSON fields would be collocated on a single node. Due
> > to common prefix.
> >
>
> In general I think I prefer the global, but separate, key mapping idea and
> use FDB's "cache the important, frequently accessed data, across
> distributed memory" features.
>
> I think I prefer the idea of indexing all document's keys using the same
> identifier set.  In general I think applications have the behavior that
> some keys are referenced far more than other keys and giving those keys in
> each document the same value I think could eventually prove useful for
> making many features faster and easier than expected.
>
> While I really like the independence and locality of a document local
> mapping, when I think about the process of transforming a document's keys
> into that mapping's values, I don't see a particular advantage regarding
> where in the DB that key mapping came from.  I'm assuming the process will
> flatten the key paths of the document into an array and then request the
> value of each key as multiple parallel queries against FDB at once.  I
> think regardless of whether the mapping is document local or global, having
> FDB return those individual values is faster/easier than having Couch Range
> fetch the mapping and do the translation work itself.
>
> I could even see some periodic "reorganizing" engine that could renumber
> frequently used keys to make the reverse transformation back into a value
> that much faster.
>
>
> > > Personally I wonder if the 10KB limit on field paths is anything more
> > than a theoretical concern. It’s hard for me to imagine a useful schema
> > that would get anywhere near that deep, but maybe I’m insufficiently
> > creative :)
>
>
> +1
>
>
> There’s certainly a storage overhead from repeating the upper portion of a
> > path over and over again, but that’s also something the storage engine can
> > optimize away through prefix elision. The current production storage engine
> > in FoundationDB does not do this elision, but the new one in development
> > does.
> >
>
> Assuming it only does "prefix" and not "segment", then I don't think this
> will help because the DOCID for each key in JSON_PATH will be different,
> making the "prefix" to each path across different documents distinct.  The
> prefix matching engine will only be able to match up to the key element
> before the DOCID.
>
> Does/Could/Would the engine allow an app to use FDB itself to create a
> mapping identifier for key "segments" or some other method to "skip past"
> the distinct parts of keys to in a sense "reroot" the search?
>
> If FDB was to "bake in" this "key segment mapping" idea as something it
> exposed to the application layer; that'd be awesome!  Lots of applications
> could probably make use of that.
>
> Mike
>

RE: [DISCUSS] : things we need to solve/decide : storing JSON documents

Reply via email to