On Fri, Aug 16, 2013 at 9:58 PM, Alexander Shorin <kxe...@gmail.com> wrote:

> On Fri, Aug 16, 2013 at 11:23 PM, Jason Smith <j...@apache.org> wrote:
> > On Fri, Aug 16, 2013 at 4:49 PM, Volker Mische <volker.mis...@gmail.com>
> > wrote:
> >>
> >> On 08/16/2013 11:32 AM, Alexander Shorin wrote:
> >> > On Fri, Aug 16, 2013 at 1:12 PM, Benoit Chesneau <bchesn...@gmail.com
> >
> >> > wrote:
> >> >> I agree, (modulo the fact that I would replace a string by a binary
> ;)
> >> >> but
> >> >> that would be only possible if we extract the metadata (_id, _rev)
> from
> >> >> the
> >> >> JSON so couchdb wouldn't have to decode the JSON to get them.
> Streaming
> >> >> json would also allows that but since there is no guaranty in the
> >> >> properties order of a JSON it would be less efficient.
> >> >
> >> > What if we split document metadata from document itself?
> >
> >
> > I would like to hear a goal for this effort? What is the definition of
> > success and failure?
>
> Idea: move document metadata into separate object.
>

How do you link the metadata to the separate object there? Do you let the
application set the internal links?

I'm +1 with such idea anyway.



> Motivation:
>
> Case 1: Small docs. No profit at all. More over, probably it's better
> to not split things there e.g. pass full doc if his size around some
> amount of megabytes.
> Case 2: Large docs. Profit in case when you have set right fields into
> metadata (like doc type, authorship, tags etc.) and filter first by
> this metadata - you have minimal memory footprint, you have less CPU
> load, rule "fast accept - fast reject" works perfectly.
>
> Side effect: it's possible to first filter by metadata and leave only
> required to process document ids. And if we known what and how many to
> process, we may make assumptions about parallel indexation.
>
> Side effect: it's possible to autoindex metadata on fly on document
> update without asking user to write (meta/by_type, meta/by_author,
> meta/by_update_time etc. viiews) . Sure, as much metadata you have as
> large base index will be. In 80% cases it will be no more than 4KB.
>
> Resume: probably, I'd just described chained views feature with
> autoindexing by certain fields (:
> Removing autoindexing feature and we could make views building process
> much more faster if we make right views chain which will use set
> algebra operations to calculate target doc ids to pass to final view:
> reduce docs before map results:
>
> {
> "views": {
>     "posts": {"map": "...", "reduce": "..."},
>     "chain": [
>      ["by_type", {"key": "post"}],
>      ["hidden", {"key": false}],
>      ["by_domain", {"keys": ["public", "wiki"]}]
>   ]
>  }
> }
>
> In case of 10000 docs db with 1200 posts where 200 are hidden and 400
> are private, result view posts have to process only 600 docs instead
> of 10000 and it's index lookup operation to find out the result docs
> to pass. Sure, calling such view triggers all views in the chain. And
> I don't think about cross dependencies and loops for know.
>
> --
> ,,,^..^,,,
>

Reply via email to