[
https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753455#action_12753455
]
Todd Lipcon commented on MAPREDUCE-967:
---------------------------------------
Currently, TaskTracker.localizeJob completely unjars job.jar in jobCacheDir.
TaskRunner then appends
<jobCacheDir>/classes:<jobCacheDir>/lib/*:<jobCacheDir>/ to the task classpath.
Instead, I propose that we only unpack the classes/ and lib/ portions of
job.jar, and add <jobCacheDir>/job.jar to the task classpath in lieu of
<jobCacheDir>/
While we're at it, I'm not sure I see the purpose of the "classes/" directory -
this is not standard Jar layout by any means, and seems unnecessary. But that
issue is orthogonal to this ticket.
Attaching a preliminary patch against branch-20, though this should go into
trunk and probably not the branch. I just want to test this on a real workload
first.
> TaskTracker does not need to fully unjar job jars
> -------------------------------------------------
>
> Key: MAPREDUCE-967
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-967
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: tasktracker
> Affects Versions: 0.21.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
>
> In practice we have seen some users submitting job jars that consist of
> 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning
> up after them has a significant cost (both in wall clock and in unnecessary
> heavy disk utilization). This cost can be easily avoided
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.