[jira] [Updated] (HADOOP-8705) Add JSR 107 Caching support

Dhruv Kumar (JIRA) Fri, 31 Aug 2012 10:55:15 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-8705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Dhruv Kumar updated HADOOP-8705:
--------------------------------

    Description: 
Having a cache on mappers and reducers could be very useful for some use cases, 
including but not limited to:

1. Iterative Map Reduce Programs: Some machine learning algorithms frequently 
need access to invariant data (see Mahout) over each iteration of MapReduce 
until convergence. A cache on such nodes could allow easy access to the hotset 
of data without going all the way to the distributed cache. This optimization 
has been described by Jimmy Lin et. al in the paper "Low-Latency, 
High-Throughput Access to Static Global Resources within the Hadoop Framework" 
(http://hcil2.cs.umd.edu/trs/2009-01/2009-01.pdf)

2. Storing of intermediate map outputs in memory to reduce shuffling time. This 
optimization has been discussed at length in Haloop 
(http://www.ics.uci.edu/~yingyib/papers/HaLoop_camera_ready.pdf), and by Shubin 
Zhang in "Accelerating MapReduce with Distributed Memory Cache" presented at 
ICPADS 2009. 

There are some other scenarios as well where having a cache could come in 
handy. 

JSR 107 aims to standardize caching interfaces for Java Application and popular 
caching solutions such as Ehcache and Memcached have JSR 107 wrapper. Hence, tt 
will be nice to have some sort of pluggable support for JSR 107 compliant 
caches on Hadoop.

  was:
Having a cache on mappers and reducers could be very useful for some use cases, 
including but not limited to:

1. Iterative Map Reduce Programs: Some machine learning algorithms frequently 
need access to invariant data (see Mahout) over each iteration of MapReduce 
until convergence. A cache on such nodes could allow easy access to the hotset 
of data without going all the way to the distributed cache.

2. Storing of intermediate map and reduce outputs in memory to reduce shuffling 
time. This optimization has been discussed at length in Haloop 
(http://www.ics.uci.edu/~yingyib/papers/HaLoop_camera_ready.pdf).

There are some other scenarios as well where having a cache could come in 
handy. 

It will be nice to have some sort of pluggable support for JSR 107 compliant 
caches. 

    
> Add JSR 107 Caching support 
> ----------------------------
>
>                 Key: HADOOP-8705
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8705
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Dhruv Kumar
>
> Having a cache on mappers and reducers could be very useful for some use 
> cases, including but not limited to:
> 1. Iterative Map Reduce Programs: Some machine learning algorithms frequently 
> need access to invariant data (see Mahout) over each iteration of MapReduce 
> until convergence. A cache on such nodes could allow easy access to the 
> hotset of data without going all the way to the distributed cache. This 
> optimization has been described by Jimmy Lin et. al in the paper 
> "Low-Latency, High-Throughput Access to Static Global Resources within the 
> Hadoop Framework" (http://hcil2.cs.umd.edu/trs/2009-01/2009-01.pdf)
> 2. Storing of intermediate map outputs in memory to reduce shuffling time. 
> This optimization has been discussed at length in Haloop 
> (http://www.ics.uci.edu/~yingyib/papers/HaLoop_camera_ready.pdf), and by 
> Shubin Zhang in "Accelerating MapReduce with Distributed Memory Cache" 
> presented at ICPADS 2009. 
> There are some other scenarios as well where having a cache could come in 
> handy. 
> JSR 107 aims to standardize caching interfaces for Java Application and 
> popular caching solutions such as Ehcache and Memcached have JSR 107 wrapper. 
> Hence, tt will be nice to have some sort of pluggable support for JSR 107 
> compliant caches on Hadoop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8705) Add JSR 107 Caching support

Reply via email to