Luis Casillas created TEZ-3299:
----------------------------------
Summary: Tez is incompatible with
HADOOP_USE_CLIENT_CLASSLOADER=true
Key: TEZ-3299
URL: https://issues.apache.org/jira/browse/TEZ-3299
Project: Apache Tez
Issue Type: Bug
Affects Versions: 0.8.3
Environment: Elastic MapReduce 4.7.0
Reporter: Luis Casillas
The ticket HADOOP-10893 introduced a new environment variable,
HADOOP_USE_CLIENT_CLASSLOADER, that makes the hadoop jar command put the client
application's own bundled jars (in the the jar file's lib/ directory ) ahead of
those bundled by the Hadoop installation.
Tez 0.8.3, however, does not play nicely with this feature. The reason is that
Tez has classes under the org.apache.hadoop package hierarchy (e.g.,
org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat).
Hadoop's ApplicationClassLoader class, which implements the
HADOOP_USE_CLIENT_CLASSLOADER=true feature, in its default configuration will
refuse to load classes inside the org.apache.hadoop packages, instead
delegating to the parent classloader. See the implementation for reference:
*
https://github.com/c9n/hadoop/blob/master/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ApplicationClassLoader.java
The way that Elastic MapReduce 4.7.0 sets up the classpath for Tez 0.8.3, the
tez-mapreduce-0.8.3.jar is in the client classpath, so in my Cascading
application I get this *extremely confusing* failure:
1. The JVM can load the
`org.apache.tez.mapreduce.input.MRInput$MRInputConfigBuilder` class
successfully;
2. But it gets a `NoClassDefFoundError` for
`org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat`
And the reason I say "extremely confusing" is because *both of these classes
are in the same jar*! This surprising difference is caused by
ApplicationClassLoader, which logs its configuration at the beginning of the
job:
{code}
16/06/11 00:51:15 INFO util.ApplicationClassLoader: system classes: [java.,
javax.accessibility., javax.activation., javax.activity., javax.annotation.,
javax.annotation.processing., javax.crypto., javax.imageio., javax.jws.,
javax.lang.model., -javax.management.j2ee., javax.management., javax.naming.,
javax.net., javax.print., javax.rmi., javax.script.,
-javax.security.auth.message., javax.security.auth., javax.security.cert.,
javax.security.sasl., javax.sound., javax.sql., javax.swing., javax.tools.,
javax.transaction., -javax.xml.registry., -javax.xml.rpc., javax.xml.,
org.w3c.dom., org.xml.sax., org.apache.commons.logging., org.apache.log4j.,
org.apache.hadoop., core-default.xml, hdfs-default.xml, mapred-default.xml,
yarn-default.xml]
{code}
This can also be verified by exporting HADOOP_OPTS='-verbose:class' before
running my application:
[Loaded org.apache.tez.mapreduce.partition.MRPartitioner from
file:/usr/lib/tez/tez-mapreduce-0.8.3.jar]
[Loaded org.apache.tez.mapreduce.hadoop.MRInputHelpers from
file:/usr/lib/tez/tez-mapreduce-0.8.3.jar]
[Loaded org.apache.tez.mapreduce.input.MRInput$MRInputHelpersInternal from
file:/usr/lib/tez/tez-mapreduce-0.8.3.jar
]
[Loaded org.apache.hadoop.mapreduce.InputFormat from
file:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core-2.7.2-amzn-2.jar]
...
16/06/11 00:51:32 ERROR dataplatform.Main: Uncaught exception
java.lang.NoClassDefFoundError:
org/apache/hadoop/mapreduce/split/TezGroupedSplitsInputFormat
at
org.apache.tez.mapreduce.input.MRInput$MRInputConfigBuilder.createGeneratorDataSource(MRInput.java:325)
at
org.apache.tez.mapreduce.input.MRInput$MRInputConfigBuilder.build(MRInput.java:249)
at
cascading.flow.tez.Hadoop2TezFlowStep.createVertex(Hadoop2TezFlowStep.java:515)
at
cascading.flow.tez.Hadoop2TezFlowStep.createDAG(Hadoop2TezFlowStep.java:216)
at
cascading.flow.tez.Hadoop2TezFlowStep.createFlowStepJob(Hadoop2TezFlowStep.java:197)
at
cascading.flow.tez.Hadoop2TezFlowStep.createFlowStepJob(Hadoop2TezFlowStep.java:123)
at
cascading.flow.planner.BaseFlowStep.getCreateFlowStepJob(BaseFlowStep.java:916)
at cascading.flow.BaseFlow.initializeNewJobsMap(BaseFlow.java:1353)
at cascading.flow.BaseFlow.initialize(BaseFlow.java:247)
at
cascading.flow.planner.FlowPlanner.buildFlow(FlowPlanner.java:203)
at cascading.flow.FlowConnector.connect(FlowConnector.java:456)
at
com.progressfin.dataplatform.sip.SipAddressFlow.buildFlow(SipAddressFlow.java:70)
at
com.progressfin.dataplatform.AllTheFlows.getAllFlows(AllTheFlows.java:141)
at
com.progressfin.dataplatform.AllTheFlows.getEverythingCascade(AllTheFlows.java:119)
at com.progressfin.dataplatform.Main.run(Main.java:114)
at com.progressfin.dataplatform.Main.main(Main.java:81)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at
org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:200)
at
org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:168)
... 22 more
So if I may suggest a solution, perhaps Tez should refrain from putting any
classes under the org.apache.hadoop package, because Hadoop may refuse to load
them under some configurations!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)