Re: [infinispan-dev] Map Reduce 2.0

Vladimir Blagojevic Mon, 20 Feb 2012 11:08:12 -0800

On 12-02-20 3:43 PM, Manik Surtani wrote:

I was under the impression reduce is distributed too?  Don't we do the mapping 
on each node, then a first-pass reduce on each node too, before streaming 
results back to the caller node?

What we do in first-pass reduce is essentially combine and we should notdo that blindly because this eager reduction/combine only works whenreduce function is both /commutative/ and /associative/! This can leadto problems when it is not:http://philippeadjiman.com/blog/2010/01/14/hadoop-tutorial-series-issue-4-to-use-or-not-to-use-a-combiner/

So yes first-pass reduce is distributed but second-phase reduce shouldbe distributed as well! Currently it is not!

This all makes sense as well, however one problem with the consistent hash 
approach is that it is prone to change when there is a topology change.  How 
would you deal with that?  Would you maintain a history of consistent hashes?

I don't think I understand! Even if there is a topology changeintermediate results Map<KOut, List<VOut>> will be migrated, all we needis KOut's, we can hash it and find out where List<VOut> are, no?


Vladimir

_______________________________________________
infinispan-dev mailing list
[email protected]
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Map Reduce 2.0

Reply via email to