[jira] Commented: (HADOOP-1622) Hadoop should provide a way to allow the user to specify jar file(s) the user job depends on

Dennis Kubes (JIRA) Tue, 24 Jul 2007 14:26:15 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515094
 ]


Dennis Kubes commented on HADOOP-1622:
--------------------------------------

I got to thinking, always a dangerous thing, and I thought if we are extending 
this for multiple jar file, why not other resources like jars on the classpath, 
jars that contain a given class, and directories.  Let's say that we could 
specify one or more directories as a resource to be included in the job jar, 
then when we do the merge we would copy all resources from that directory into 
the job jar.  This would allow us to do thing like deploy executables, resource 
files, or multiple jar files across the cluster to be used in the jobs.  So say 
you have a custom executable you need to call in your MR job, you just drop it 
in a directory, include the directory as a job resource and that executable 
would get deployed out onto the cluster and would be available for that single 
job.

I went back and refactored the code to allow job resources as opposed to just 
jar files.  A resource would be either an absolute path to a jar file, a jar 
file on the classpath, a directory, or the name of a class that is contained in 
a jar on the classpath.  As an added bonus getJars and addJar now become 
getJobResources and addJobResource (we may need to come up with a different 
name as this might be too easily confused with default and final resouces in 
configuration), and we can keep getJar and setJar as they now apply only to the 
final job jar file.

I am doing final testing of this code right now and will have a patch up in 
just a little while.

> Hadoop should provide a way to allow the user to specify jar file(s) the user 
> job depends on
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1622
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1622
>             Project: Hadoop
>          Issue Type: Improvement
>            Reporter: Runping Qi
>         Attachments: multipleJobJars.patch
>
>
> More likely than not, a user's job may depend on multiple jars.
> Right now, when submitting a job through bin/hadoop, there is no way for the 
> user to specify that. 
> A walk around for that is to re-package all the dependent jars into a new jar 
> or put the dependent jar files in the lib dir of the new jar.
> This walk around causes unnecessary inconvenience to the user. Furthermore, 
> if the user does not own the main function 
> (like the case when the user uses Aggregate, or datajoin, streaming), the 
> user has to re-package those system jar files too.
> It is much desired that hadoop provides a clean and simple way for the user 
> to specify a list of dependent jar files at the time 
> of job submission. Someting like:
> bin/hadoop .... --depending_jars j1.jar:j2.jar 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1622) Hadoop should provide a way to allow the user to specify jar file(s) the user job depends on

Reply via email to