Re: Partitioned Clusters

Chris Anderson Thu, 19 Feb 2009 19:13:10 -0800

On Thu, Feb 19, 2009 at 6:39 PM, Ben Browning <[email protected]> wrote:
> Overall the model sounds very similar to what I was thinking. I just
> have a few comments.
>
>> In this model documents are saved to a leaf node depending on a hash
>> of the docid. This means that lookups are easy, and need only to touch
>> the leaf node which holds the doc. Redundancy can be provided by
>> maintaining R replicas of every leaf node.
>
> There are several use-cases where a true hash of the docid won't be the
> optimal partitioning key. The simple case is where you want to partition
> your data by user and in most non-trivial cases you won't be storing
> all of a user's data under one document with the user's id as the docid.
> A fairly simple solution would be allowing the developer to specify a 
> javascript
> function somewhere (not sure where this should live...) that takes a docid and
> spits out a partition key. Then I could just prefix all my doc ids for
> a specific user
> with that user's id and write the appropriate partition function.
>
>>
>> View queries, on the other hand, must be handled by every node. The
>> requests are proxied down the tree to leaf nodes, which respond
>> normally. Each proxy node then runs a merge sort algorithm (which can
>> sort in constant space proportional to # of input streams) on the view
>> results. This can happen recursively if the tree is deep.
>
> If the developer has control over partition keys as suggested above, it's
> entirely possible to have applications where view queries only need data
> from one partition. It would be great if we could do something smart here or
> have a way for the developer to indicate to Couch that all the data should
> be on only one partition.
>
> These are just nice-to-have features and the described cluster setup could
> still be extremely useful without them.


I think they are both sensible optimizations. Damien's described the
JS partition function before on IRC, so I think it fits into the
model. As far as restricting view queries to just those docs within a
particular id range, it might make sense to partition by giving each
user their own database, rather than logic on the docid. In the case
where you need data in a single db, but still have some queries that
can be partitioned, its still a good optimization. Luckily even in the
unoptimized case, if a node has no rows to contribute to the final
view result than it should have a low impact on total resources needed
to generate the result.

Chris

-- 
Chris Anderson
http://jchris.mfdz.com

Re: Partitioned Clusters

Reply via email to