On 20 Feb 2012, at 19:07, Vladimir Blagojevic wrote: > On 12-02-20 3:43 PM, Manik Surtani wrote: >> >> I was under the impression reduce is distributed too? Don't we do the >> mapping on each node, then a first-pass reduce on each node too, before >> streaming results back to the caller node? > What we do in first-pass reduce is essentially combine and we should not do > that blindly because this eager reduction/combine only works when reduce > function is both commutative and associative! This can lead to problems when > it is not: > http://philippeadjiman.com/blog/2010/01/14/hadoop-tutorial-series-issue-4-to-use-or-not-to-use-a-combiner/ > > So yes first-pass reduce is distributed but second-phase reduce should be > distributed as well! Currently it is not!
Ok. >> This all makes sense as well, however one problem with the consistent hash >> approach is that it is prone to change when there is a topology change. How >> would you deal with that? Would you maintain a history of consistent hashes? >> >> > I don't think I understand! Even if there is a topology change intermediate > results Map<KOut, List<VOut>> will be migrated, all we need is KOut's, we can > hash it and find out where List<VOut> are, no? Well, if you assign a set of tasks to specific nodes based on a consistent hash, the topology then changes, you'd lose the information on where you sent specific tasks. Cheers Manik -- Manik Surtani [email protected] twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org
_______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
