[
https://issues.apache.org/jira/browse/SPARK-18673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16664286#comment-16664286
]
Dagang Wei edited comment on SPARK-18673 at 10/25/18 9:20 PM:
--------------------------------------------------------------
Is it possible to fix in org.spark-project.hive before SPARK-20202 "Remove
references to org.spark-project.hive" is resolved? In my Hadoop depolyment
(Hadoop 3.1.0, Hive 3.1.0 and Spark 2.3.1), when I run spark-shell, I got
java.lang.IllegalArgumentException: Unrecognized Hadoop major version number:
3.1.0
at org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:174)
at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:139)
at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100)
at org.apache.hadoop.hive.conf.HiveConf$ConfVars.<clinit>(HiveConf.java:368)
at org.apache.hadoop.hive.conf.HiveConf.<clinit>(HiveConf.java:105)
After examining the JARs, it turns out that the
org.apache.hadoop.hive.shims.ShimLoader class was from
<spark-home>/jars/hive-exec-1.2.1.spark2.jar (instead of
<hive-home>/lib/hive-shims-common-3.1.0.jar). Could somebody let me know where
the source code of hive-exec-1.2.1.spark2.jar is? Or in general how spark fork
of hive works, so that I can fix the problem in it.
was (Author: functicons):
Is it possible to fix in org.spark-project.hive before SPARK-20202 "Remove
references to org.spark-project.hive" is resolved? In my Hadoop depolyment
(Hadoop 3.1.0, Hive 3.1.0 and Spark 2.3.1), when I run spark-shell, I got
java.lang.IllegalArgumentException: Unrecognized Hadoop major version number:
3.1.0
at org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:174)
at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:139)
at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100)
at org.apache.hadoop.hive.conf.HiveConf$ConfVars.<clinit>(HiveConf.java:368)
at org.apache.hadoop.hive.conf.HiveConf.<clinit>(HiveConf.java:105)
After examining the JARs, it turns out that the
org.apache.hadoop.hive.shims.ShimLoader class that spark-shell trying to load
was from <spark-home>/jars/hive-exec-1.2.1.spark2.jar (instead of
<hive-home>/lib/hive-shims-common-3.1.0.jar). Could somebody let me know where
the source code of hive-exec-1.2.1.spark2.jar is? Or in general how spark fork
of hive works, so that I can fix the problem in it.
> Dataframes doesn't work on Hadoop 3.x; Hive rejects Hadoop version
> ------------------------------------------------------------------
>
> Key: SPARK-18673
> URL: https://issues.apache.org/jira/browse/SPARK-18673
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.1.0
> Environment: Spark built with -Dhadoop.version=3.0.0-alpha2-SNAPSHOT
> Reporter: Steve Loughran
> Priority: Major
>
> Spark Dataframes fail to run on Hadoop 3.0.x, because hive.jar's shimloader
> considers 3.x to be an unknown Hadoop version.
> Hive itself will have to fix this; as Spark uses its own hive 1.2.x JAR, it
> will need to be updated to match.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]