[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062815#comment-14062815
 ] 

Jason Lowe commented on MAPREDUCE-5957:
---------------------------------------

Yeah, seeing the second patch I can understand why you initially gravitated to 
the first approach.  I agree that it is quite likely things will break as the 
code is maintained.  Not only do we have to flip-flop each time a new custom 
class is loaded but also each time we invoke a method on a user-provided 
instance.  So if a new committer method is added or what-not that needs to be 
called before we finally throw the classloader switch for good then that's 
another necessary flip-flop.  That's annoying and probably won't be remembered 
when the new method invocation is added.

If we do stick with the flip-flop approach then it'd be nice if we had a nice 
way to wrap such code with a common flip-flop wrapper.  I'm thinking something 
akin to how UserGroupInformation.doAs works so we can wrap code with common 
logic and don't have to copy-n-paste the wrapper everywhere.  However the 
wrapped code has to be marshaled in the form of a Runnable or what-not, so that 
might not be much better in the end.

So I guess it comes down to weighing the likelihood this will ever be needed in 
practice or if simply setting the conf classloader will "just work."  I'm not 
as worried about the reflection case since that's probably rare to do without 
already leveraging Hadoop's conf class property processing.  However I'm 
worried about thread creation since I could see a case where a 
committer/speculator/whatever creates some threads in their constructor or some 
other method we invoke before throwing the final classloader switch.  If those 
threads end up inheriting the system TCCL instead of the job classloader TCCL 
then we have a problem.  If that indeed is what would happen then it comes down 
to how likely is it that user code will want to create threads in 
constructors/methods that are invoked before the final classloader switch.  I 
don't know offhand, unfortunately.

If we can convince ourselves the thread use by user methods invoked before the 
final loader switch is either a non-issue or super unlikely then I think we 
should go for the simpler fix to set the conf loader early.  Otherwise I think 
we may be stuck doing the flip flop case just for correctness sake, and 
hopefully we can make that as painless and obvious as possible.

Which reminds me, the CommitterEventHandler creates a thread with which it 
invokes methods on the output committer.  That thread is going to have the 
system TCCL since it was created before the final classloader switch, correct?  
If so would we also have problems if the output committer does lazy thread 
creation or class loading when commit methods are invoked as the job runs?  
Seems like the CommitterEventHandler event handling thread needs the job 
classloader, if specified.



> AM throws ClassNotFoundException with job classloader enabled if custom 
> output format/committer is used
> -------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5957
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5957
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.4.0
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>         Attachments: MAPREDUCE-5957.patch, MAPREDUCE-5957.patch, 
> MAPREDUCE-5957.patch
>
>
> With the job classloader enabled, the MR AM throws ClassNotFoundException if 
> a custom output format class is specified.
> {noformat}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
> com.foo.test.TestOutputFormat not found
>       at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:473)
>       at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:374)
>       at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>       at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1459)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>       at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1456)
>       at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1389)
> Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> Class com.foo.test.TestOutputFormat not found
>       at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895)
>       at 
> org.apache.hadoop.mapreduce.task.JobContextImpl.getOutputFormatClass(JobContextImpl.java:222)
>       at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:469)
>       ... 8 more
> Caused by: java.lang.ClassNotFoundException: Class 
> com.foo.test.TestOutputFormat not found
>       at 
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
>       at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893)
>       ... 10 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to