[ 
https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753455#action_12753455
 ] 

Todd Lipcon commented on MAPREDUCE-967:
---------------------------------------

Currently, TaskTracker.localizeJob completely unjars job.jar in jobCacheDir. 
TaskRunner then appends 
<jobCacheDir>/classes:<jobCacheDir>/lib/*:<jobCacheDir>/ to the task classpath. 
Instead, I propose that we only unpack the classes/ and lib/ portions of 
job.jar, and add <jobCacheDir>/job.jar to the task classpath in lieu of 
<jobCacheDir>/

While we're at it, I'm not sure I see the purpose of the "classes/" directory - 
this is not standard Jar layout by any means, and seems unnecessary. But that 
issue is orthogonal to this ticket.

Attaching a preliminary patch against branch-20, though this should go into 
trunk and probably not the branch. I just want to test this on a real workload 
first.

> TaskTracker does not need to fully unjar job jars
> -------------------------------------------------
>
>                 Key: MAPREDUCE-967
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-967
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.21.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>
> In practice we have seen some users submitting job jars that consist of 
> 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning 
> up after them has a significant cost (both in wall clock and in unnecessary 
> heavy disk utilization). This cost can be easily avoided

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to