[
https://issues.apache.org/jira/browse/HADOOP-8705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13442654#comment-13442654
]
Dhruv Kumar commented on HADOOP-8705:
-------------------------------------
For the impatient, I have uploaded a presentation about Haloop which I gave
some time back in graduate school:
http://www.slideserve.com/dkumar/optimizing-iterative-mapreduce-jobs
> Add JSR 107 Caching support
> ----------------------------
>
> Key: HADOOP-8705
> URL: https://issues.apache.org/jira/browse/HADOOP-8705
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Dhruv Kumar
>
> Having a cache on mappers and reducers could be very useful for some use
> cases, including but not limited to:
> 1. Iterative Map Reduce Programs: Some machine learning algorithms frequently
> need access to invariant data (see Mahout) over each iteration of MapReduce
> until convergence. A cache on such nodes could allow easy access to the
> hotset of data without going all the way to the distributed cache.
> 2. Storing of intermediate map and reduce outputs in memory to reduce
> shuffling time. This optimization has been discussed at length in Haloop
> (http://www.ics.uci.edu/~yingyib/papers/HaLoop_camera_ready.pdf).
> There are some other scenarios as well where having a cache could come in
> handy.
> It will be nice to have some sort of pluggable support for JSR 107 compliant
> caches.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira