On 12-02-20 3:43 PM, Manik Surtani wrote:
I was under the impression reduce is distributed too? Don't we do the mapping
on each node, then a first-pass reduce on each node too, before streaming
results back to the caller node?
What we do in first-pass reduce is essentially combine and we should not
do that blindly because this eager reduction/combine only works when
reduce function is both /commutative/ and /associative/! This can lead
to problems when it is not:
http://philippeadjiman.com/blog/2010/01/14/hadoop-tutorial-series-issue-4-to-use-or-not-to-use-a-combiner/
So yes first-pass reduce is distributed but second-phase reduce should
be distributed as well! Currently it is not!
This all makes sense as well, however one problem with the consistent hash
approach is that it is prone to change when there is a topology change. How
would you deal with that? Would you maintain a history of consistent hashes?
I don't think I understand! Even if there is a topology change
intermediate results Map<KOut, List<VOut>> will be migrated, all we need
is KOut's, we can hash it and find out where List<VOut> are, no?
Vladimir
_______________________________________________
infinispan-dev mailing list
[email protected]
https://lists.jboss.org/mailman/listinfo/infinispan-dev