On 02/18/2014 05:36 PM, Vladimir Blagojevic wrote: > On 2/18/2014, 4:59 AM, Dan Berindei wrote: >> >> The limitation we have now is that in the reduce phase, the entire >> list of values for one intermediate key must be in memory at once. I >> think Hadoop only loads a block of intermediate values in memory at >> once, and can even sort the intermediate values (with a user-supplied >> comparison function) so that the reduce function can work on a sorted >> list without loading the values in memory itself. >> >> > Dan and others, > > This is where Sanne's idea comes into play. Why collect entire list of > intermediate values for each intermediate key and then invoke reduce on > those values when we can invoke reduce each time new intermediate value > gets inserted? > Because you cant. What you are saying is more like combining than reducing. If there is a combiner in the MapReduceTask you can execute the combiner on a subset (in your case 2) values with the same key and output one. But, this is not possible always. > https://issues.jboss.org/browse/ISPN-3999 > > Cheers, > Vladimir > _______________________________________________ > infinispan-dev mailing list > [email protected] > https://lists.jboss.org/mailman/listinfo/infinispan-dev >
_______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
