[ 
https://issues.apache.org/jira/browse/HIVE-3423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446572#comment-13446572
 ] 

Zhenxiao Luo commented on HIVE-3423:
------------------------------------

This is a dependency problem,

in common/src/java/org/apache/hadoop/hive/common/FileUtils.java:

FileUtils.java, tar() is using GzipCompressorOutputStream, and import this 
class in the header.

When FileSinkOperator.java is calling FileUtils.makePartName(dpColNames, row) 
in its getDynPartDirectory() function, this NoClassDefFoundError is triggered.

This is because, both tar() and makePartName() are static functions in 
FileUtils.java.

when one static member of a class is referenced, the JVM has to initialize all 
static members, including static methods, and part of initializing static 
methods involves making sure that all of the dependencies are available.

so, when makePartName() is called from FileSinkOperator.getDynPartDirectory(), 
tar() is initialized, and JVM tries to resolve the GzipCompressorOutputStream 
dependency.

GzipCompressorOutputStream is from commons-compress*.jar. This dependency is 
added to common/ivy.xml, but not included in hive-exec.jar.

When running on one machine, it is passed, sine commons-compress*.jar is 
resolved by ivy, and it is in build/ivy/lib/default.

But when running in a real cluster, commons-compress*.jar is not included in 
all mapper/reducers classpath. So, this NoClassDefFoundError would be triggered.

My proposed solution is to add commons-compress*.jar into hive-exec.jar.
                
> merge_dynamic_partition.q is failing when running hive on real cluster
> ----------------------------------------------------------------------
>
>                 Key: HIVE-3423
>                 URL: https://issues.apache.org/jira/browse/HIVE-3423
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.10.0
>            Reporter: Zhenxiao Luo
>            Assignee: Zhenxiao Luo
>             Fix For: 0.10.0
>
>
> merge_dynamic_partition (and a number of other qfiles) is failing when 
> running the current hive on a real cluster:
> java.lang.RuntimeException: java.lang.NoClassDefFoundError: 
> org/apache/commons/compress/compressors/gzip/GzipCompressorOutputStream
>       at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161)
>       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>       at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:393)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:327)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>       at org.apache.hadoop.mapred.Child.main(Child.java:262)
> Caused by: java.lang.NoClassDefFoundError: 
> org/apache/commons/compress/compressors/gzip/GzipCompressorOutputStream
>       at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynPartDirectory(FileSinkOperator.java:644)
>       at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:613)
>       at 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to