[jira] Commented: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars

Todd Lipcon (JIRA) Mon, 26 Oct 2009 11:27:24 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770121#action_12770121
 ]


Todd Lipcon commented on MAPREDUCE-967:
---------------------------------------

One note about this JIRA - it will need some fix for Streaming as well. The 
common way that people ship scripts for streaming is using the "-file foo.py" 
argument. This just includes foo.py in the job jar and assumes it will be 
unpacked on the other side. With this patch, it won't unpack those and breaks 
the -file argument's primary use case.

Two options to fix this issue:
# We could change -file to use DistributedCache instead. The fact that -file 
and -files do different things is confusing in the first place, but changing 
the behavior is potentially breaking change, I think.
# We could change Streaming to add all of the -file paths to the new 
configuration parameter such that the existing behavior is preserved.

If no one else has a preference I'll go for option #2 above.

> TaskTracker does not need to fully unjar job jars
> -------------------------------------------------
>
>                 Key: MAPREDUCE-967
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-967
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.21.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: mapreduce-967-branch-0.20.txt
>
>
> In practice we have seen some users submitting job jars that consist of 
> 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning 
> up after them has a significant cost (both in wall clock and in unnecessary 
> heavy disk utilization). This cost can be easily avoided

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars

Reply via email to