On Dec 18, 2012, at 6:50 PM, Yonik Seeley <[email protected]> wrote:
> On Tue, Dec 18, 2012 at 6:28 PM, Steve McKay <[email protected]> wrote: >> I'm looking at sending stats facets between shards to speed up merging. >> Rather than have one node responsible for merging the facet sets from every >> shard, each facet set is partitioned by term and then each shard merges one >> partition of each facet set. A-D, E-G, etc. > > Could you give a concrete example of what you're thinking (say 3 > shards and just a few terms?) Take three shards and the field spending_category, which has 6 terms: C, D, G, I, L, O. Currently when a stats request is faceted on spending_category the controller will receive results for each shard with all 6 facets present, and merge the results together. What I'm talking about is having each shard partition its result into {C, D}, {G, I}, {L, O}. Then shard 2 and 3 send facets C and D to shard 1 for merging and likewise for the other shards. Then the result each shard sends back to the controller is independent of the other shard results and merging is trivial. In that example, merging doesn't take significant time either way. What motivates this is doing top-k operations on facet sets of large cardinality, e.g. 1 million unique elements, 200,000 elements being returned by each of 6 shards. Currently, doing all the merging on the controller, a top-10 query spends most of its time merging shard results. Distributing the merge step should significantly improve that. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
