[
https://issues.apache.org/jira/browse/MAPREDUCE-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062815#comment-14062815
]
Jason Lowe commented on MAPREDUCE-5957:
---------------------------------------
Yeah, seeing the second patch I can understand why you initially gravitated to
the first approach. I agree that it is quite likely things will break as the
code is maintained. Not only do we have to flip-flop each time a new custom
class is loaded but also each time we invoke a method on a user-provided
instance. So if a new committer method is added or what-not that needs to be
called before we finally throw the classloader switch for good then that's
another necessary flip-flop. That's annoying and probably won't be remembered
when the new method invocation is added.
If we do stick with the flip-flop approach then it'd be nice if we had a nice
way to wrap such code with a common flip-flop wrapper. I'm thinking something
akin to how UserGroupInformation.doAs works so we can wrap code with common
logic and don't have to copy-n-paste the wrapper everywhere. However the
wrapped code has to be marshaled in the form of a Runnable or what-not, so that
might not be much better in the end.
So I guess it comes down to weighing the likelihood this will ever be needed in
practice or if simply setting the conf classloader will "just work." I'm not
as worried about the reflection case since that's probably rare to do without
already leveraging Hadoop's conf class property processing. However I'm
worried about thread creation since I could see a case where a
committer/speculator/whatever creates some threads in their constructor or some
other method we invoke before throwing the final classloader switch. If those
threads end up inheriting the system TCCL instead of the job classloader TCCL
then we have a problem. If that indeed is what would happen then it comes down
to how likely is it that user code will want to create threads in
constructors/methods that are invoked before the final classloader switch. I
don't know offhand, unfortunately.
If we can convince ourselves the thread use by user methods invoked before the
final loader switch is either a non-issue or super unlikely then I think we
should go for the simpler fix to set the conf loader early. Otherwise I think
we may be stuck doing the flip flop case just for correctness sake, and
hopefully we can make that as painless and obvious as possible.
Which reminds me, the CommitterEventHandler creates a thread with which it
invokes methods on the output committer. That thread is going to have the
system TCCL since it was created before the final classloader switch, correct?
If so would we also have problems if the output committer does lazy thread
creation or class loading when commit methods are invoked as the job runs?
Seems like the CommitterEventHandler event handling thread needs the job
classloader, if specified.
> AM throws ClassNotFoundException with job classloader enabled if custom
> output format/committer is used
> -------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-5957
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5957
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Affects Versions: 2.4.0
> Reporter: Sangjin Lee
> Assignee: Sangjin Lee
> Attachments: MAPREDUCE-5957.patch, MAPREDUCE-5957.patch,
> MAPREDUCE-5957.patch
>
>
> With the job classloader enabled, the MR AM throws ClassNotFoundException if
> a custom output format class is specified.
> {noformat}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException:
> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
> com.foo.test.TestOutputFormat not found
> at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:473)
> at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:374)
> at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1459)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1456)
> at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1389)
> Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException:
> Class com.foo.test.TestOutputFormat not found
> at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895)
> at
> org.apache.hadoop.mapreduce.task.JobContextImpl.getOutputFormatClass(JobContextImpl.java:222)
> at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:469)
> ... 8 more
> Caused by: java.lang.ClassNotFoundException: Class
> com.foo.test.TestOutputFormat not found
> at
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
> at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893)
> ... 10 more
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)