On Dec 18, 2012, at 6:50 PM, Yonik Seeley <[email protected]> wrote:

> On Tue, Dec 18, 2012 at 6:28 PM, Steve McKay <[email protected]> wrote:
>> I'm looking at sending stats facets between shards to speed up merging.
>> Rather than have one node responsible for merging the facet sets from every
>> shard, each facet set is partitioned by term and then each shard merges one
>> partition of each facet set. A-D, E-G, etc.
> 
> Could you give a concrete example of what you're thinking (say 3
> shards and just a few terms?)

Take three shards and the field spending_category, which has 6 terms: C, D, G, 
I, L, O. Currently when a stats request is faceted on spending_category the 
controller will receive results for each shard with all 6 facets present, and 
merge the results together. What I'm talking about is having each shard 
partition its result into {C, D}, {G, I}, {L, O}. Then shard 2 and 3 send 
facets C and D to shard 1 for merging and likewise for the other shards. Then 
the result each shard sends back to the controller is independent of the other 
shard results and merging is trivial.

In that example, merging doesn't take significant time either way. What 
motivates this is doing top-k operations on facet sets of large cardinality, 
e.g. 1 million unique elements, 200,000 elements being returned by each of 6 
shards. Currently, doing all the merging on the controller, a top-10 query 
spends most of its time merging shard results. Distributing the merge step 
should significantly improve that.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to