[ https://issues.apache.org/jira/browse/HADOOP-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gautam Kowshik updated HADOOP-1032: ----------------------------------- Affects Version/s: 0.11.2 > Support for caching Job JARs > ----------------------------- > > Key: HADOOP-1032 > URL: https://issues.apache.org/jira/browse/HADOOP-1032 > Project: Hadoop > Issue Type: New Feature > Components: mapred > Affects Versions: 0.11.2 > Reporter: Gautam Kowshik > Priority: Minor > > Often jobs need to be rerun number of times.. like a job that reads from > crawled data time and again.. so having to upload job jars to every node is > cumbersome. We need a caching mechanism to boost performance. Here are the > features for job specific caching of jars/conf files.. > - Ability to resubmit jobs with jars without having to propagate same jar to > all nodes. > The idea is to keep a store(path mentioned by user in job.xml?) local to > the task node so as to speed up task initiation on tasktrackers. Assumes that > the jar does not change during an MR task. > - An independent DFS store to upload jars to (Distributed File Cache?).. that > does not cleanup between jobs. > This might need user level configuration to indicate to the jobclient to > upload files to DFSCache instead of the DFS. > https://issues.apache.org/jira/browse/HADOOP-288 facilitates this. Our local > cache can be client to the DFS Cache. > - A standard cache mechanism that checks for changes in the local store and > picks from dfs if found dirty. > This does away with versioning. The DFSCache supports a md5 checksum > check, we can use that. > Anything else? Suggestions? Thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.