[ 
https://issues.apache.org/jira/browse/HADOOP-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kowshik updated HADOOP-1032:
-----------------------------------

    Affects Version/s: 0.11.2

> Support for caching Job JARs 
> -----------------------------
>
>                 Key: HADOOP-1032
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1032
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>    Affects Versions: 0.11.2
>            Reporter: Gautam Kowshik
>            Priority: Minor
>
> Often jobs need to be rerun number of times.. like a job that reads from 
> crawled data time and again.. so having to upload job jars to every node is 
> cumbersome. We need a caching mechanism to boost performance. Here are the 
> features for job specific caching of jars/conf files.. 
>  - Ability to resubmit jobs with jars without having to propagate same jar to 
> all nodes.
>     The idea is to keep a store(path mentioned by user in job.xml?) local to 
> the task node so as to speed up task initiation on tasktrackers. Assumes that 
> the jar does not change during an MR task.
> - An independent DFS store to upload jars to (Distributed File Cache?).. that 
> does not cleanup between jobs.
>     This might need user level configuration to indicate to the jobclient to 
> upload files to DFSCache instead of the DFS. 
> https://issues.apache.org/jira/browse/HADOOP-288 facilitates this. Our local 
> cache can be client to the DFS Cache.
> - A standard cache mechanism that checks for changes in the local store and 
> picks from dfs if found dirty.
>    This does away with versioning. The DFSCache supports a md5 checksum 
> check, we can use that.
> Anything else? Suggestions? Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to