[ https://issues.apache.org/jira/browse/HADOOP-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515094 ]
Dennis Kubes commented on HADOOP-1622: -------------------------------------- I got to thinking, always a dangerous thing, and I thought if we are extending this for multiple jar file, why not other resources like jars on the classpath, jars that contain a given class, and directories. Let's say that we could specify one or more directories as a resource to be included in the job jar, then when we do the merge we would copy all resources from that directory into the job jar. This would allow us to do thing like deploy executables, resource files, or multiple jar files across the cluster to be used in the jobs. So say you have a custom executable you need to call in your MR job, you just drop it in a directory, include the directory as a job resource and that executable would get deployed out onto the cluster and would be available for that single job. I went back and refactored the code to allow job resources as opposed to just jar files. A resource would be either an absolute path to a jar file, a jar file on the classpath, a directory, or the name of a class that is contained in a jar on the classpath. As an added bonus getJars and addJar now become getJobResources and addJobResource (we may need to come up with a different name as this might be too easily confused with default and final resouces in configuration), and we can keep getJar and setJar as they now apply only to the final job jar file. I am doing final testing of this code right now and will have a patch up in just a little while. > Hadoop should provide a way to allow the user to specify jar file(s) the user > job depends on > -------------------------------------------------------------------------------------------- > > Key: HADOOP-1622 > URL: https://issues.apache.org/jira/browse/HADOOP-1622 > Project: Hadoop > Issue Type: Improvement > Reporter: Runping Qi > Attachments: multipleJobJars.patch > > > More likely than not, a user's job may depend on multiple jars. > Right now, when submitting a job through bin/hadoop, there is no way for the > user to specify that. > A walk around for that is to re-package all the dependent jars into a new jar > or put the dependent jar files in the lib dir of the new jar. > This walk around causes unnecessary inconvenience to the user. Furthermore, > if the user does not own the main function > (like the case when the user uses Aggregate, or datajoin, streaming), the > user has to re-package those system jar files too. > It is much desired that hadoop provides a clean and simple way for the user > to specify a list of dependent jar files at the time > of job submission. Someting like: > bin/hadoop .... --depending_jars j1.jar:j2.jar -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.