On Fri, Aug 16, 2013 at 9:58 PM, Alexander Shorin <kxe...@gmail.com> wrote:
> On Fri, Aug 16, 2013 at 11:23 PM, Jason Smith <j...@apache.org> wrote: > > On Fri, Aug 16, 2013 at 4:49 PM, Volker Mische <volker.mis...@gmail.com> > > wrote: > >> > >> On 08/16/2013 11:32 AM, Alexander Shorin wrote: > >> > On Fri, Aug 16, 2013 at 1:12 PM, Benoit Chesneau <bchesn...@gmail.com > > > >> > wrote: > >> >> I agree, (modulo the fact that I would replace a string by a binary > ;) > >> >> but > >> >> that would be only possible if we extract the metadata (_id, _rev) > from > >> >> the > >> >> JSON so couchdb wouldn't have to decode the JSON to get them. > Streaming > >> >> json would also allows that but since there is no guaranty in the > >> >> properties order of a JSON it would be less efficient. > >> > > >> > What if we split document metadata from document itself? > > > > > > I would like to hear a goal for this effort? What is the definition of > > success and failure? > > Idea: move document metadata into separate object. > How do you link the metadata to the separate object there? Do you let the application set the internal links? I'm +1 with such idea anyway. > Motivation: > > Case 1: Small docs. No profit at all. More over, probably it's better > to not split things there e.g. pass full doc if his size around some > amount of megabytes. > Case 2: Large docs. Profit in case when you have set right fields into > metadata (like doc type, authorship, tags etc.) and filter first by > this metadata - you have minimal memory footprint, you have less CPU > load, rule "fast accept - fast reject" works perfectly. > > Side effect: it's possible to first filter by metadata and leave only > required to process document ids. And if we known what and how many to > process, we may make assumptions about parallel indexation. > > Side effect: it's possible to autoindex metadata on fly on document > update without asking user to write (meta/by_type, meta/by_author, > meta/by_update_time etc. viiews) . Sure, as much metadata you have as > large base index will be. In 80% cases it will be no more than 4KB. > > Resume: probably, I'd just described chained views feature with > autoindexing by certain fields (: > Removing autoindexing feature and we could make views building process > much more faster if we make right views chain which will use set > algebra operations to calculate target doc ids to pass to final view: > reduce docs before map results: > > { > "views": { > "posts": {"map": "...", "reduce": "..."}, > "chain": [ > ["by_type", {"key": "post"}], > ["hidden", {"key": false}], > ["by_domain", {"keys": ["public", "wiki"]}] > ] > } > } > > In case of 10000 docs db with 1200 posts where 200 are hidden and 400 > are private, result view posts have to process only 600 docs instead > of 10000 and it's index lookup operation to find out the result docs > to pass. Sure, calling such view triggers all views in the chain. And > I don't think about cross dependencies and loops for know. > > -- > ,,,^..^,,, >