[
https://issues.apache.org/jira/browse/MAPREDUCE-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14050054#comment-14050054
]
Sangjin Lee commented on MAPREDUCE-5957:
----------------------------------------
The gist of this issue is regarding the use of Configuration.getClass() and the
use of the thread context classloader (TCCL). Currently
MRApps.setJobClassLoader() sets both the configuration classloader and the TCCL
at the same time. So once setJobClassLoader() is called, it is made available
in both contexts.
MAPREDUCE-5751 was caused because the job classloader was made available *too
early as the TCCL*. This issue is caused because the job classloader is made
available *too late as the configuration classloader*.
The normal classloading scheme (one class initializing another class via normal
use or even Class.forName) is unaffected by this if my understanding is correct.
I see two possible approaches for this:
(1) separate the timing of setting the job classloader as the configuration
classloader and the TCCL
I think while setting the TCCL should be delayed as much as possible (i.e. the
current timing), the job classloader can be installed as the configuration
classloader much earlier. If the configuration loads a user class, that's
precisely what we need. If it loads a system class, the job classloader will
delegate anyhow. I don't think there is harm in setting the configuration
classloader early.
(2) set and unset the job classloader around the code that loads classes from
the configuration
Identify the code points in MRAppMaster where Configuration.getClass() is
needed, and set and unset the job classloader around them. Although this would
also solve this problem, the downside is that one needs to make a determination
that the job classloader is needed and set/unset it. This is potentially
brittle.
I think (1) is a more robust solution to this problem. Do you see an issue with
taking that approach?
I don't think the task (YarnChild) is affected by this.
> AM throws ClassNotFoundException with job classloader enabled if custom
> output format/committer is used
> -------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-5957
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5957
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Affects Versions: 2.4.0
> Reporter: Sangjin Lee
> Assignee: Sangjin Lee
>
> With the job classloader enabled, the MR AM throws ClassNotFoundException if
> a custom output format class is specified.
> {noformat}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException:
> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
> com.foo.test.TestOutputFormat not found
> at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:473)
> at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:374)
> at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1459)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1456)
> at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1389)
> Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException:
> Class com.foo.test.TestOutputFormat not found
> at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895)
> at
> org.apache.hadoop.mapreduce.task.JobContextImpl.getOutputFormatClass(JobContextImpl.java:222)
> at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:469)
> ... 8 more
> Caused by: java.lang.ClassNotFoundException: Class
> com.foo.test.TestOutputFormat not found
> at
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
> at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893)
> ... 10 more
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)