[ 
https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772500#action_12772500
 ] 

Vinod K V commented on MAPREDUCE-967:
-------------------------------------

bq. One note about this JIRA - it will need some fix for Streaming as well. The 
common way that people ship scripts for streaming is using the "-file foo.py" 
argument. This just includes foo.py in the job jar and assumes it will be 
unpacked on the other side. With this patch, it won't unpack those and breaks 
the -file argument's primary use case.

I've just looked up the documentation, and, though not very explicit, {{-file}} 
is part of the job.jar (and hence for small files) whereas {{-files, 
-archives}} can be used for large files. So, going by that, I am +1 for the 2nd 
approach that you've outlined. If we want to be sure, we can make the above 
distinction explicit in the forrest docs.

Will quickly look at your patch.

> TaskTracker does not need to fully unjar job jars
> -------------------------------------------------
>
>                 Key: MAPREDUCE-967
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-967
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.21.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: mapreduce-967-branch-0.20.txt, mapreduce-967.txt, 
> mapreduce-967.txt
>
>
> In practice we have seen some users submitting job jars that consist of 
> 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning 
> up after them has a significant cost (both in wall clock and in unnecessary 
> heavy disk utilization). This cost can be easily avoided

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to