On 18.2.2014 16:36, Vladimir Blagojevic wrote: > On 2/18/2014, 4:59 AM, Dan Berindei wrote: >> >> The limitation we have now is that in the reduce phase, the entire >> list of values for one intermediate key must be in memory at once. I >> think Hadoop only loads a block of intermediate values in memory at >> once, and can even sort the intermediate values (with a user-supplied >> comparison function) so that the reduce function can work on a sorted >> list without loading the values in memory itself. >> >> > Dan and others, > > This is where Sanne's idea comes into play. Why collect entire list of > intermediate values for each intermediate key and then invoke reduce on > those values when we can invoke reduce each time new intermediate value > gets inserted?
I don't know about MR in Infinispan, but MR in CouchDB is doing a very similar thing to what you describe. In order to actually get a final result, they have to do an entire tree of reductions, and the reduce function has to distinguish between a "first-level" reduce (on bare values) and rereduce (on intermediate results from previous reductions). They are _not_ always the same, and it's fairly confusing. LT > > https://issues.jboss.org/browse/ISPN-3999 > > Cheers, > Vladimir > _______________________________________________ > infinispan-dev mailing list > [email protected] > https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
