On 13-08-18 09:33 AM, Alexander Shorin wrote:
On Sun, Aug 18, 2013 at 3:54 PM, Volker Mische <[email protected]> wrote:
On 08/18/2013 08:42 AM, Alexander Shorin wrote:
On Sun, Aug 18, 2013 at 10:22 AM, Benoit Chesneau <[email protected]> wrote:
On Fri, Aug 16, 2013 at 9:58 PM, Alexander Shorin <[email protected]> wrote:

On Fri, Aug 16, 2013 at 11:23 PM, Jason Smith <[email protected]> wrote:
On Fri, Aug 16, 2013 at 4:49 PM, Volker Mische <[email protected]>
wrote:
On 08/16/2013 11:32 AM, Alexander Shorin wrote:
On Fri, Aug 16, 2013 at 1:12 PM, Benoit Chesneau <[email protected]
wrote:
I agree, (modulo the fact that I would replace a string by a binary
;)
but
that would be only possible if we extract the metadata (_id, _rev)
from
the
JSON so couchdb wouldn't have to decode the JSON to get them.
Streaming
json would also allows that but since there is no guaranty in the
properties order of a JSON it would be less efficient.
What if we split document metadata from document itself?

I would like to hear a goal for this effort? What is the definition of
success and failure?
Idea: move document metadata into separate object.

How do you link the metadata to the separate object there? Do you let the
application set the internal links?

I'm +1 with such idea anyway.
Mmm...how I imagine it (Disclaimer: I'm sure I'm wrong in details there!):

Btree:

     ----+----
    |        |
  --+--    --+--
|    |  |    |
*    *  *    *

At the node we have doc object {...} for specific revision. Instead of
this, we'll have a tuple ({...}, {...}) - first is a meta, second is a
data.
So I think there wouldn't be needed internal links since meta and data
would live within same Btree node.
For regular doc requesting, they will be merged (still need for `_`
prefix to avoid collisions?) and returned as single {...} as always.
We could also return them as separate objects, so the view function
becomes: function(doc, meta) {}.

Couchbase does that and from my experience it works well and feel right.
Oh, so this idea even works (:

However, the trick was about to not pass doc part (in case if it big
enough) to the view server until view server wouldn't process his
metadata. Otherwise this is good feature, but it wouldn't help with
indexing speed up. I remind the trick: first process meta part and if
it passed - load the doc. Later I'd sent another mail where I'd
eventually reinvented chained views, because trick with meta does
exactly the same, chained views are more correct way to go. See quote
at the end with resume.

Anyway, I feel we need to inherit Couchbase experience with document's
metadata object (of course if they wouldn't sue us for that ((: )
since everyone already same some preferred metadata fields (like type)
or uses special object for that to not pollute main document body.
I'm prefer special '.meta' object at the document root which holds
document type info, authorship, timestamps, bindings, etc.
It's good feature to have no matter does it optimizes indexation
process or not (:

I would suggest either prefixing with an underscore, or the use of a separate object passed to the view server.

If someone ( such as myself ) has many many documents, which happen to contain a "meta" attribute, it would be non-trivial to upgrade / migrate. A migration script could be written of course, although it wouldn't be ideal;

Something to consider, it may be worth while to simply use obj._meta instead of .meta.


Below is about chained views:

On Fri, Aug 16, 2013 at 11:58 PM, Alexander Shorin <[email protected]> wrote:
Resume: probably, I'd just described chained views feature with
autoindexing by certain fields (:
Removing autoindexing feature and we could make views building process
much more faster if we make right views chain which will use set
algebra operations to calculate target doc ids to pass to final view:
reduce docs before map results:

{
"views": {
     "posts": {"map": "...", "reduce": "..."},
     "chain": [
      ["by_type", {"key": "post"}],
      ["hidden", {"key": false}],
      ["by_domain", {"keys": ["public", "wiki"]}]
   ]
  }
}

In case of 10000 docs db with 1200 posts where 200 are hidden and 400
are private, result view posts have to process only 600 docs instead
of 10000 and it's index lookup operation to find out the result docs
to pass. Sure, calling such view triggers all views in the chain.


Chained views would be awesome! I'm sure I'm not alone in having solved this problem by using multiple queries and matching document IDs.

Reply via email to