[jira] Commented: (HADOOP-1622) Hadoop should provide a way to allow the user to specify jar file(s) the user job depends on

Dennis Kubes (JIRA) Thu, 13 Mar 2008 12:18:04 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578454#action_12578454
 ]


Dennis Kubes commented on HADOOP-1622:
--------------------------------------

I have not resumed working on this as of yet.  Am currently neck deep in 
reworking NIO for hadoop RPC.  I was planning on finishing on this as soon as I 
had completed the NIO code in the next 2-3 days.  I would like to continue 
working on this if possible.  When is 0.17 scheduled for release?

Owen, the first pass at this didn't distinguish between jar or regular files on 
the command line.  Instead there was detection code that identified files as 
such.  Also the first pass supported directories as well as files (don't know 
if you are including that in file).  I think the ability to include directories 
for job input is extremely important.  What were the special cases that you 
were seeing?

The idea behind this code is much like streaming you could upload and cache 
files from any type of resource (file, directory, jar, etc.) from any file 
system.  So, for instance people could store common jars or file resources on 
S3 and pull them down into a job.



> Hadoop should provide a way to allow the user to specify jar file(s) the user 
> job depends on
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1622
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1622
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Mahadev konar
>             Fix For: 0.17.0
>
>         Attachments: hadoop-1622-4-20071008.patch, HADOOP-1622-5.patch, 
> HADOOP-1622-6.patch, HADOOP-1622-7.patch, HADOOP-1622-8.patch, 
> HADOOP-1622-9.patch, multipleJobJars.patch, multipleJobResources.patch, 
> multipleJobResources2.patch
>
>
> More likely than not, a user's job may depend on multiple jars.
> Right now, when submitting a job through bin/hadoop, there is no way for the 
> user to specify that. 
> A walk around for that is to re-package all the dependent jars into a new jar 
> or put the dependent jar files in the lib dir of the new jar.
> This walk around causes unnecessary inconvenience to the user. Furthermore, 
> if the user does not own the main function 
> (like the case when the user uses Aggregate, or datajoin, streaming), the 
> user has to re-package those system jar files too.
> It is much desired that hadoop provides a clean and simple way for the user 
> to specify a list of dependent jar files at the time 
> of job submission. Someting like:
> bin/hadoop .... --depending_jars j1.jar:j2.jar 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1622) Hadoop should provide a way to allow the user to specify jar file(s) the user job depends on

Reply via email to