[
https://issues.apache.org/jira/browse/MAPREDUCE-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tom White updated MAPREDUCE-1700:
---------------------------------
Attachment: MAPREDUCE-1700-ccl.patch
bq. Most of the class loader issues stem from long running containers that need
to dynamically load/unload classes.
Also, the case we are talking about does not have the complex classloader trees
that app servers have, so there are no sibling class sharing issues. In the
task JVM there is only a single user app, so the classloader hierarchy is
linear (boot, extension, system, job).
There are a few cases where certain APIs make assumptions about which
classloader to use:
* *The system classloader*. For example, URL stream handlers are loaded by the
classloader that loaded java.net.URL (boot), or the system classloader. So if a
task registered a URL stream handler and it was in the job JAR, then it
wouldn't be found since it was loaded by the job classloader, not the system
classloader. In this case, the workaround is to implement a factory and call
URL.setURLStreamHandlerFactory().
* *The caller's current classloader*. For example, java.util.ResourceBundle
uses the caller's current classloader, so if the framework tries to load a
bundle then the bundle (e.g. a localization bundle) would not be found if it
were in the job JAR, since the system classloader (which loaded the framework
class) can't see the job classloader's classes. As it happens, MR counters use
resource bundles; however, they explicitly use the context classloader, so this
problem doesn't occur (see org.apache.hadoop.mapreduce.util.ResourceBundles).
(Also, I imagine the use of resource bundles to localize counter names in the
job JAR is very rare.)
* *The context classloader*. For example, JAXP uses the context classloader to
load the DocumentBuilderFactory specified in a system property. This case is
covered by setting the context classloader to be the job classloader for the
duration of the task (my latest patch does this). Most APIs that involve
classloaders use the context classloader these days.
So all of these cases can be handled. Also note that by default the job
classloader is not used, to enable it you need to set
mapreduce.job.isolated.classloader to true for your job.
The latest patch handles the case of embedded lib and classes directories in
the JAR, as well as distributed cache files and archives. The unit test passes
(and fails with a NoSuchMethodError due to the class incompatibility if
mapreduce.job.isolated.classloader is set to false). So I think it is pretty
close now - the main thing left to do is sort out the build for the test, which
relies on the MR examples module.
> User supplied dependencies may conflict with MapReduce system JARs
> ------------------------------------------------------------------
>
> Key: MAPREDUCE-1700
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1700
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: task
> Reporter: Tom White
> Assignee: Tom White
> Attachments: MAPREDUCE-1700-ccl.patch, MAPREDUCE-1700-ccl.patch,
> MAPREDUCE-1700.patch, MAPREDUCE-1700.patch
>
>
> If user code has a dependency on a version of a JAR that is different to the
> one that happens to be used by Hadoop, then it may not work correctly. This
> happened with user code using a different version of Avro, as reported
> [here|https://issues.apache.org/jira/browse/AVRO-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852081#action_12852081].
> The problem is analogous to the one that application servers have with WAR
> loading. Using a specialized classloader in the Child JVM is probably the way
> to solve this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira