Luis Casillas created TEZ-3299:
----------------------------------

             Summary: Tez is incompatible with 
HADOOP_USE_CLIENT_CLASSLOADER=true
                 Key: TEZ-3299
                 URL: https://issues.apache.org/jira/browse/TEZ-3299
             Project: Apache Tez
          Issue Type: Bug
    Affects Versions: 0.8.3
         Environment: Elastic MapReduce 4.7.0
            Reporter: Luis Casillas


The ticket HADOOP-10893 introduced a new environment variable, 
HADOOP_USE_CLIENT_CLASSLOADER, that makes the hadoop jar command put the client 
application's own bundled jars (in the the jar file's lib/ directory ) ahead of 
those bundled by the Hadoop installation. 

Tez 0.8.3, however, does not play nicely with this feature.  The reason is that 
Tez has classes under the org.apache.hadoop package hierarchy (e.g., 
org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat).
Hadoop's ApplicationClassLoader class, which implements the 
HADOOP_USE_CLIENT_CLASSLOADER=true feature, in its default configuration will 
refuse to load classes inside the org.apache.hadoop packages, instead 
delegating to the parent classloader.  See the implementation for reference:

* 
https://github.com/c9n/hadoop/blob/master/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ApplicationClassLoader.java

The way that Elastic MapReduce 4.7.0 sets up the classpath for Tez 0.8.3, the 
tez-mapreduce-0.8.3.jar is in the client classpath, so in my Cascading 
application I get this *extremely confusing* failure:

1. The JVM can load the 
`org.apache.tez.mapreduce.input.MRInput$MRInputConfigBuilder` class 
successfully;
2. But it gets a `NoClassDefFoundError` for 
`org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat`

And the reason I say "extremely confusing" is because *both of these classes 
are in the same jar*!  This surprising difference is caused by 
ApplicationClassLoader, which logs its configuration at the beginning of the 
job:

{code}
16/06/11 00:51:15 INFO util.ApplicationClassLoader: system classes: [java., 
javax.accessibility., javax.activation., javax.activity., javax.annotation., 
javax.annotation.processing., javax.crypto., javax.imageio., javax.jws., 
javax.lang.model., -javax.management.j2ee., javax.management., javax.naming., 
javax.net., javax.print., javax.rmi., javax.script., 
-javax.security.auth.message., javax.security.auth., javax.security.cert., 
javax.security.sasl., javax.sound., javax.sql., javax.swing., javax.tools., 
javax.transaction., -javax.xml.registry., -javax.xml.rpc., javax.xml., 
org.w3c.dom., org.xml.sax., org.apache.commons.logging., org.apache.log4j., 
org.apache.hadoop., core-default.xml, hdfs-default.xml, mapred-default.xml, 
yarn-default.xml]
{code}

This can also be verified by exporting HADOOP_OPTS='-verbose:class' before 
running my application:

    [Loaded org.apache.tez.mapreduce.partition.MRPartitioner from 
file:/usr/lib/tez/tez-mapreduce-0.8.3.jar]
    [Loaded org.apache.tez.mapreduce.hadoop.MRInputHelpers from 
file:/usr/lib/tez/tez-mapreduce-0.8.3.jar]
    [Loaded org.apache.tez.mapreduce.input.MRInput$MRInputHelpersInternal from 
file:/usr/lib/tez/tez-mapreduce-0.8.3.jar
    ]
    [Loaded org.apache.hadoop.mapreduce.InputFormat from 
file:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core-2.7.2-amzn-2.jar]
    
    ...
    
    16/06/11 00:51:32 ERROR dataplatform.Main: Uncaught exception
    java.lang.NoClassDefFoundError: 
org/apache/hadoop/mapreduce/split/TezGroupedSplitsInputFormat
            at 
org.apache.tez.mapreduce.input.MRInput$MRInputConfigBuilder.createGeneratorDataSource(MRInput.java:325)
            at 
org.apache.tez.mapreduce.input.MRInput$MRInputConfigBuilder.build(MRInput.java:249)
            at 
cascading.flow.tez.Hadoop2TezFlowStep.createVertex(Hadoop2TezFlowStep.java:515)
            at 
cascading.flow.tez.Hadoop2TezFlowStep.createDAG(Hadoop2TezFlowStep.java:216)
            at 
cascading.flow.tez.Hadoop2TezFlowStep.createFlowStepJob(Hadoop2TezFlowStep.java:197)
            at 
cascading.flow.tez.Hadoop2TezFlowStep.createFlowStepJob(Hadoop2TezFlowStep.java:123)
            at 
cascading.flow.planner.BaseFlowStep.getCreateFlowStepJob(BaseFlowStep.java:916)
            at cascading.flow.BaseFlow.initializeNewJobsMap(BaseFlow.java:1353)
            at cascading.flow.BaseFlow.initialize(BaseFlow.java:247)
            at 
cascading.flow.planner.FlowPlanner.buildFlow(FlowPlanner.java:203)
            at cascading.flow.FlowConnector.connect(FlowConnector.java:456)
            at 
com.progressfin.dataplatform.sip.SipAddressFlow.buildFlow(SipAddressFlow.java:70)
            at 
com.progressfin.dataplatform.AllTheFlows.getAllFlows(AllTheFlows.java:141)
            at 
com.progressfin.dataplatform.AllTheFlows.getEverythingCascade(AllTheFlows.java:119)
            at com.progressfin.dataplatform.Main.run(Main.java:114)
            at com.progressfin.dataplatform.Main.main(Main.java:81)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
            at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:498)
            at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
            at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
    Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat
            at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
            at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
            at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
            at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
            at 
org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:200)
            at 
org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:168)
            ... 22 more

So if I may suggest a solution, perhaps Tez should refrain from putting any 
classes under the org.apache.hadoop package, because Hadoop may refuse to load 
them under some configurations!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to