On 20 Feb 2012, at 19:07, Vladimir Blagojevic wrote:

> On 12-02-20 3:43 PM, Manik Surtani wrote:
>> 
>> I was under the impression reduce is distributed too?  Don't we do the 
>> mapping on each node, then a first-pass reduce on each node too, before 
>> streaming results back to the caller node?
> What we do in first-pass reduce is essentially combine and we should not do 
> that blindly because this eager reduction/combine only works when reduce 
> function is both commutative and associative! This can lead to problems when 
> it is not: 
> http://philippeadjiman.com/blog/2010/01/14/hadoop-tutorial-series-issue-4-to-use-or-not-to-use-a-combiner/
> 
> So yes first-pass reduce is distributed but second-phase reduce should be 
> distributed as well! Currently it is not!

Ok.
>> This all makes sense as well, however one problem with the consistent hash 
>> approach is that it is prone to change when there is a topology change.  How 
>> would you deal with that?  Would you maintain a history of consistent hashes?
>> 
>> 
> I don't think I understand! Even if there is a topology change intermediate 
> results Map<KOut, List<VOut>> will be migrated, all we need is KOut's, we can 
> hash it and find out where List<VOut> are, no?

Well, if you assign a set of tasks to specific nodes based on a consistent 
hash, the topology then changes, you'd lose the information on where you sent 
specific tasks.

Cheers
Manik
--
Manik Surtani
[email protected]
twitter.com/maniksurtani

Lead, Infinispan
http://www.infinispan.org



_______________________________________________
infinispan-dev mailing list
[email protected]
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Reply via email to