[jira] [Comment Edited] (SPARK-18673) Dataframes doesn't work on Hadoop 3.x; Hive rejects Hadoop version

Dagang Wei (JIRA) Thu, 25 Oct 2018 14:21:48 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-18673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16664286#comment-16664286
 ]


Dagang Wei edited comment on SPARK-18673 at 10/25/18 9:20 PM:
--------------------------------------------------------------

Is it possible to fix in org.spark-project.hive before SPARK-20202 "Remove 
references to org.spark-project.hive" is resolved? In my Hadoop depolyment 
(Hadoop 3.1.0, Hive 3.1.0 and Spark 2.3.1), when I run spark-shell, I got

java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 
3.1.0
 at org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:174)
 at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:139)
 at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100)
 at org.apache.hadoop.hive.conf.HiveConf$ConfVars.<clinit>(HiveConf.java:368)
 at org.apache.hadoop.hive.conf.HiveConf.<clinit>(HiveConf.java:105)

After examining the JARs, it turns out that the 
org.apache.hadoop.hive.shims.ShimLoader class was from 
<spark-home>/jars/hive-exec-1.2.1.spark2.jar (instead of 
<hive-home>/lib/hive-shims-common-3.1.0.jar). Could somebody let me know where 
the source code of hive-exec-1.2.1.spark2.jar is? Or in general how spark fork 
of hive works, so that I can fix the problem in it.

 


was (Author: functicons):
Is it possible to fix in org.spark-project.hive before SPARK-20202 "Remove 
references to org.spark-project.hive" is resolved? In my Hadoop depolyment 
(Hadoop 3.1.0, Hive 3.1.0 and Spark 2.3.1), when I run spark-shell, I got

 java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 
3.1.0
 at org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:174)
 at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:139)
 at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100)
 at org.apache.hadoop.hive.conf.HiveConf$ConfVars.<clinit>(HiveConf.java:368)
 at org.apache.hadoop.hive.conf.HiveConf.<clinit>(HiveConf.java:105)

After examining the JARs, it turns out that the 
org.apache.hadoop.hive.shims.ShimLoader class that spark-shell trying to load 
was from <spark-home>/jars/hive-exec-1.2.1.spark2.jar (instead of 
<hive-home>/lib/hive-shims-common-3.1.0.jar). Could somebody let me know where 
the source code of hive-exec-1.2.1.spark2.jar is? Or in general how spark fork 
of hive works, so that I can fix the problem in it.

 

> Dataframes doesn't work on Hadoop 3.x; Hive rejects Hadoop version
> ------------------------------------------------------------------
>
>                 Key: SPARK-18673
>                 URL: https://issues.apache.org/jira/browse/SPARK-18673
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.0
>         Environment: Spark built with -Dhadoop.version=3.0.0-alpha2-SNAPSHOT 
>            Reporter: Steve Loughran
>            Priority: Major
>
> Spark Dataframes fail to run on Hadoop 3.0.x, because hive.jar's shimloader 
> considers 3.x to be an unknown Hadoop version.
> Hive itself will have to fix this; as Spark uses its own hive 1.2.x JAR, it 
> will need to be updated to match.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SPARK-18673) Dataframes doesn't work on Hadoop 3.x; Hive rejects Hadoop version

Reply via email to