[jira] Updated: (HADOOP-1622) Hadoop should provide a way to allow the user to specify jar file(s) the user job depends on

Mahadev konar (JIRA) Thu, 20 Mar 2008 16:33:26 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mahadev konar updated HADOOP-1622:
----------------------------------

    Attachment: HADOOP-1622_1.patch

attaching a patch for this feature. It does not have unit tests included. I am 
still writing unit tests and will upload a patch by the end of the day. 

this patch enhances the hadoop command line for job submission:

so you can say:

- bin/hadoop jar -files <commaseperated files> -libjars <comma seperated libs> 
-archives <comma seperated archives>

- these options are all optional and the command line is backwards compatible

- the patch uses cli for command line parsing

- it uses DistributedCache for copying files locally to the tasks 

- it supports uri's in the command line arguments

- if the files are already uploaded do the hdfs used by jobtracker then it does 
not recopy the files -- there is a tiny catch here ... since the uri's are 
matched as string for the remote file system and the one jt uses, it might be 
possible that the files are copied even though its the same dfs (ex: 
hdfs://hostname1:port != hdfs://hostname1.fullyqualifiedname:port) 

- the command line files, archives, libajrs are stored temporarurly in the hdfs 
job directory from where they are copied locally.


> Hadoop should provide a way to allow the user to specify jar file(s) the user 
> job depends on
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1622
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1622
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Mahadev konar
>             Fix For: 0.17.0
>
>         Attachments: hadoop-1622-4-20071008.patch, HADOOP-1622-5.patch, 
> HADOOP-1622-6.patch, HADOOP-1622-7.patch, HADOOP-1622-8.patch, 
> HADOOP-1622-9.patch, HADOOP-1622_1.patch, multipleJobJars.patch, 
> multipleJobResources.patch, multipleJobResources2.patch
>
>
> More likely than not, a user's job may depend on multiple jars.
> Right now, when submitting a job through bin/hadoop, there is no way for the 
> user to specify that. 
> A walk around for that is to re-package all the dependent jars into a new jar 
> or put the dependent jar files in the lib dir of the new jar.
> This walk around causes unnecessary inconvenience to the user. Furthermore, 
> if the user does not own the main function 
> (like the case when the user uses Aggregate, or datajoin, streaming), the 
> user has to re-package those system jar files too.
> It is much desired that hadoop provides a clean and simple way for the user 
> to specify a list of dependent jar files at the time 
> of job submission. Someting like:
> bin/hadoop .... --depending_jars j1.jar:j2.jar 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1622) Hadoop should provide a way to allow the user to specify jar file(s) the user job depends on

Reply via email to