[
https://issues.apache.org/jira/browse/HIVE-3423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446572#comment-13446572
]
Zhenxiao Luo commented on HIVE-3423:
------------------------------------
This is a dependency problem,
in common/src/java/org/apache/hadoop/hive/common/FileUtils.java:
FileUtils.java, tar() is using GzipCompressorOutputStream, and import this
class in the header.
When FileSinkOperator.java is calling FileUtils.makePartName(dpColNames, row)
in its getDynPartDirectory() function, this NoClassDefFoundError is triggered.
This is because, both tar() and makePartName() are static functions in
FileUtils.java.
when one static member of a class is referenced, the JVM has to initialize all
static members, including static methods, and part of initializing static
methods involves making sure that all of the dependencies are available.
so, when makePartName() is called from FileSinkOperator.getDynPartDirectory(),
tar() is initialized, and JVM tries to resolve the GzipCompressorOutputStream
dependency.
GzipCompressorOutputStream is from commons-compress*.jar. This dependency is
added to common/ivy.xml, but not included in hive-exec.jar.
When running on one machine, it is passed, sine commons-compress*.jar is
resolved by ivy, and it is in build/ivy/lib/default.
But when running in a real cluster, commons-compress*.jar is not included in
all mapper/reducers classpath. So, this NoClassDefFoundError would be triggered.
My proposed solution is to add commons-compress*.jar into hive-exec.jar.
> merge_dynamic_partition.q is failing when running hive on real cluster
> ----------------------------------------------------------------------
>
> Key: HIVE-3423
> URL: https://issues.apache.org/jira/browse/HIVE-3423
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.10.0
> Reporter: Zhenxiao Luo
> Assignee: Zhenxiao Luo
> Fix For: 0.10.0
>
>
> merge_dynamic_partition (and a number of other qfiles) is failing when
> running the current hive on a real cluster:
> java.lang.RuntimeException: java.lang.NoClassDefFoundError:
> org/apache/commons/compress/compressors/gzip/GzipCompressorOutputStream
> at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:393)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:327)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
> at org.apache.hadoop.mapred.Child.main(Child.java:262)
> Caused by: java.lang.NoClassDefFoundError:
> org/apache/commons/compress/compressors/gzip/GzipCompressorOutputStream
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynPartDirectory(FileSinkOperator.java:644)
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:613)
> at
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira