[ 
https://issues.apache.org/jira/browse/HADOOP-8705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13442627#comment-13442627
 ] 

Dhruv Kumar commented on HADOOP-8705:
-------------------------------------

Ahmed, definitely, another advantage of having a larger, pluggable 
MapOutputBuffer is the potential reduction of Speculative Execution on other 
nodes which should improve the network performance in the cases of unbalanced 
clusters.

Kapil, the Haloop paper which I linked in this JIRA describes the storing of 
intermediate map results for consumption by reducers. You can find their Apache 
Licensed code on Google Code, if you want to dive down into the specifics. 

Here's another related use case of using Memcached (or any other caching layer) 
with Hadoop, although this is a slightly different "plugging" point: 
http://www.slideserve.com/layne/mapreduce-and-databases.
                
> Add JSR 107 Caching support 
> ----------------------------
>
>                 Key: HADOOP-8705
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8705
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Dhruv Kumar
>
> Having a cache on mappers and reducers could be very useful for some use 
> cases, including but not limited to:
> 1. Iterative Map Reduce Programs: Some machine learning algorithms frequently 
> need access to invariant data (see Mahout) over each iteration of MapReduce 
> until convergence. A cache on such nodes could allow easy access to the 
> hotset of data without going all the way to the distributed cache.
> 2. Storing of intermediate map and reduce outputs in memory to reduce 
> shuffling time. This optimization has been discussed at length in Haloop 
> (http://www.ics.uci.edu/~yingyib/papers/HaLoop_camera_ready.pdf).
> There are some other scenarios as well where having a cache could come in 
> handy. 
> It will be nice to have some sort of pluggable support for JSR 107 compliant 
> caches. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to