[
https://issues.apache.org/jira/browse/SPARK-32256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon resolved SPARK-32256.
----------------------------------
Fix Version/s: 3.1.0
3.0.1
Resolution: Fixed
Issue resolved by pull request 29059
[https://github.com/apache/spark/pull/29059]
> Hive may fail to detect Hadoop version when using isolated classloader
> ----------------------------------------------------------------------
>
> Key: SPARK-32256
> URL: https://issues.apache.org/jira/browse/SPARK-32256
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.0.0
> Reporter: Shixiong Zhu
> Assignee: Shixiong Zhu
> Priority: Blocker
> Fix For: 3.0.1, 3.1.0
>
>
> Spark allows the user to set `spark.sql.hive.metastore.jars` to specify jars
> to access Hive Metastore. These jars are loaded by the isolated classloader.
> Because we also share Hadoop classes with the isolated classloader, the user
> doesn't need to add Hadoop jars to `spark.sql.hive.metastore.jars`, which
> means when we are using the isolated classloader, hadoop-common jar is not
> available in this case. If Hadoop VersionInfo is not initialized before we
> switch to the isolated classloader, and we try to initialize it using the
> isolated classloader (the current thread context classloader), it will fail
> and report `Unknown` which causes Hive to throw the following exception:
> {code}
> java.lang.RuntimeException: Illegal Hadoop Version: Unknown (expected A.B.*
> format)
> at
> org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:147)
> at
> org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:122)
> at
> org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:88)
> at
> org.apache.hadoop.hive.metastore.ObjectStore.getDataSourceProps(ObjectStore.java:377)
> at
> org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:268)
> at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
> at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
> at
> org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:58)
> at
> org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:517)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:482)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:544)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:370)
> at
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:78)
> at
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:84)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:219)
> at
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:67)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1548)
> at
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86)
> at
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
> at
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
> at
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3080)
> at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3108)
> at
> org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3349)
> at
> org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:217)
> at
> org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:204)
> at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:331)
> at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:292)
> at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:262)
> at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:247)
> at
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:543)
> at
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:511)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:175)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:128)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at
> org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:301)
> at
> org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:431)
> at
> org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:324)
> at
> org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:72)
> at
> org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:71)
> at
> org.apache.spark.sql.hive.client.HadoopVersionInfoSuite.$anonfun$new$1(HadoopVersionInfoSuite.scala:63)
> at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> {code}
> Technically, This is indeed an issue of Hadoop VersionInfo which has been
> fixed: https://issues.apache.org/jira/browse/HADOOP-14067. But since we are
> still supporting old Hadoop versions, we should fix it.
> Why this issue starts to happen in Spark 3.0.0?
> In Spark 2.4.x, we use Hive 1.2.1 by default. It will trigger `VersionInfo`
> initialization in the static codes of `Hive` class. This will happen when we
> load `HiveClientImpl` class because `HiveClientImpl.clent` method refers to
> `Hive` class. At this moment, the thread context classloader is not using the
> isolcated classloader, so it can access hadoop-common jar on the classpath
> and initialize it correctly.
> In Spark 3.0.0, we use Hive 2.3.7. The static codes of `Hive` class are not
> accessing `VersionInfo` because of the change in
> https://issues.apache.org/jira/browse/HIVE-11657. Instead, accessing
> `VersionInfo` happens when creating a `Hive` object (See the above stack
> trace). This happens here
> https://github.com/apache/spark/blob/v3.0.0/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L260.
> But we switch to the isolated classloader before calling
> `HiveClientImpl.client` (See
> https://github.com/apache/spark/blob/v3.0.0/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L283).
> This is exactly what I mentioned above: `If Hadoop VersionInfo is not
> initialized before we switch to the isolated classloader, and we try to
> initialize it using the isolated classloader (the current thread context
> classloader), it will fail`
> I marked this is a blocker because it's a regression in 3.0.0 caused by
> upgrading Hive execution version from 1.2.1 to 2.3.7.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]