[jira] Updated: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars

Todd Lipcon (JIRA) Mon, 07 Dec 2009 22:14:44 -0800

     [ 
https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Todd Lipcon updated MAPREDUCE-967:
----------------------------------

    Attachment: mapreduce-967.txt

bq. we should definitely document clearly the points you've mentioned above 
w.r.t the classpath.

You're totally right, and I actually did this and forgot to upload the patch! 
My bad. Here's a new one.

bq. makes this JIRA issue an incompatible change

Yes, this is technically incompatible. But I think it's not a problem for the 
following reasons:
- Since job.jar is itself added to the classpath, the standard classloader will 
pick up anything inside job.jar just as if it were expanded and the resulting 
dir were put on the classpath
- The only other people this should break are those who are using java.io (or 
other non-classpath-related access methods) to access things unpacked from the 
jar. The new configuration parameter is a suitable workaround for them (as 
demonstrated by Streaming). In this case, what's on the classpath doesn't 
matter since they're not using a ClassLoader anyhow.
- Non-java applications are the only ones for whom the above two points don't 
apply, but non-Java applications don't have any concept of classpath and 
therefore it shouldn't be a problem.

Philosophically, isn't pre-1.0 exactly when we should be making these minor 
incompatible changes for the purposes of code cleanliness? Compared to the 
other drastic changes we're putting in 22, this is hardly a showstopper. I 
don't see anything *against* the change you're requesting, except that I think 
we should do everything in our power now to clean up the code before we call 
Hadoop 1.0. If I'm the only one with this philosophy, I'll acquiesce, but I 
think the sloppy classpath is just as likely to come back to bite us as fixing 
it.

> TaskTracker does not need to fully unjar job jars
> -------------------------------------------------
>
>                 Key: MAPREDUCE-967
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-967
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.21.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: mapreduce-967-branch-0.20.txt, mapreduce-967.txt, 
> mapreduce-967.txt, mapreduce-967.txt, mapreduce-967.txt
>
>
> In practice we have seen some users submitting job jars that consist of 
> 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning 
> up after them has a significant cost (both in wall clock and in unnecessary 
> heavy disk utilization). This cost can be easily avoided

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars

Reply via email to