This is an automated email from the ASF dual-hosted git repository.

LuciferYang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new b2580fc795ba [SPARK-57263][SQL] Support Hive 4.2 metastore
b2580fc795ba is described below

commit b2580fc795ba6b9d36f414e28a670501b6d8077f
Author: YangJie <[email protected]>
AuthorDate: Fri Jun 5 14:24:58 2026 +0800

    [SPARK-57263][SQL] Support Hive 4.2 metastore
    
    ### What changes were proposed in this pull request?
    
    This PR adds Hive `4.2.0` as a supported metastore client version, 
following 4.0 (SPARK-45265) and 4.1 (SPARK-53095).
    
    - Add `hive.v4_2` with the `extraDeps` taken from the Hive 4.2 POM. A few 
datanucleus/jdo deps are actually lower than 4.1 (`datanucleus-api-jdo` 6.0.3 
vs 6.0.5, `datanucleus-core` 6.0.10 vs 6.0.11, `javax.jdo` 3.2.0 vs 3.2.1), 
while Derby is bumped to `10.17.1.0` for Java 21. There is a note in 
`package.scala` so these don't get "fixed" upward later.
    - `Shim_v4_2` extends `Shim_v4_1`. The shimmed method signatures are 
unchanged between 4.1 and 4.2, so the body is empty.
    - Hive 4.2 is compiled with `maven.compiler.target=21`, so its jars cannot 
load on an older JVM. When a 4.2 client is constructed on Java < 21, it now 
fails with `UNSUPPORTED_HIVE_METASTORE_VERSION_FOR_JAVA` instead of a raw 
`UnsupportedClassVersionError`. The check lives on the client-construction path 
rather than in `hiveVersion()`, so config validation still resolves `4.2.0` 
normally.
    - `HiveClientVersions` includes `4.2` in the test sweep only on Java 21+.
    - Update the supported metastore version range in the docs.
    
    ### Why are the changes needed?
    
    Hive 4.2.0 is released and supports JDK 21. Users on Java 21 should be able 
to connect Spark to a Hive 4.2 metastore via 
`spark.sql.hive.metastore.version=4.2.0`.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes. `4.2.0` is now a valid value for `spark.sql.hive.metastore.version` 
(Java 21+). On an older JVM, setting it fails fast with a clear message.
    
    ### How was this patch tested?
    
    - Adding `4.2` to `HiveClientVersions` runs it through `HiveClientSuites`, 
which loads the client via the isolated classloader (requires network access to 
download the 4.2 jars).
    - Checked the shimmed methods in `Hive.java` and `IMetaStoreClient.java` 
against Hive 4.2 source; no differences from 4.1.
    - `build/sbt 'core/testOnly *SparkThrowableSuite'` for the new error 
condition.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #56337 from LuciferYang/worktree-SPARK-hive42-metastore.
    
    Authored-by: YangJie <[email protected]>
    Signed-off-by: yangjie01 <[email protected]>
---
 .../src/main/resources/error/error-conditions.json |  6 ++++
 docs/sql-data-sources-hive-tables.md               |  3 +-
 docs/sql-migration-guide.md                        |  2 +-
 .../spark/sql/errors/QueryExecutionErrors.scala    | 12 ++++++++
 .../org/apache/spark/sql/hive/HiveUtils.scala      |  3 +-
 .../spark/sql/hive/client/HiveClientImpl.scala     |  1 +
 .../apache/spark/sql/hive/client/HiveShim.scala    |  2 ++
 .../sql/hive/client/IsolatedClientLoader.scala     |  9 ++++++
 .../org/apache/spark/sql/hive/client/package.scala | 32 +++++++++++++++++++++-
 .../spark/sql/hive/client/HiveClientVersions.scala | 14 ++++++++--
 10 files changed, 78 insertions(+), 6 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-conditions.json 
b/common/utils/src/main/resources/error/error-conditions.json
index 146db79e4cc6..4e66798d4039 100644
--- a/common/utils/src/main/resources/error/error-conditions.json
+++ b/common/utils/src/main/resources/error/error-conditions.json
@@ -8453,6 +8453,12 @@
     ],
     "sqlState" : "42K0E"
   },
+  "UNSUPPORTED_HIVE_METASTORE_VERSION_FOR_JAVA" : {
+    "message" : [
+      "Hive metastore version <version> requires Java <requiredJavaVersion> or 
later, but the current JVM is Java <currentJavaVersion>. Please upgrade your 
Java version or use an earlier Hive metastore version."
+    ],
+    "sqlState" : "0A000"
+  },
   "UNSUPPORTED_INSERT" : {
     "message" : [
       "Can't insert into the target."
diff --git a/docs/sql-data-sources-hive-tables.md 
b/docs/sql-data-sources-hive-tables.md
index 977efa8f2433..f4587ee41eaa 100644
--- a/docs/sql-data-sources-hive-tables.md
+++ b/docs/sql-data-sources-hive-tables.md
@@ -130,7 +130,8 @@ The following options can be used to configure the version 
of Hive that is used
     <td><code>2.3.10</code></td>
     <td>
       Version of the Hive metastore. Available
-      options are <code>2.0.0</code> through <code>2.3.10</code>, 
<code>3.0.0</code> through <code>3.1.3</code>, and <code>4.0.0</code> through 
<code>4.1.0</code>.
+      options are <code>2.0.0</code> through <code>2.3.10</code>, 
<code>3.0.0</code> through <code>3.1.3</code>, and <code>4.0.0</code> through 
<code>4.2.0</code>.
+      Note: Hive 4.2 requires Java 21 or later.
     </td>
     <td>1.4.0</td>
   </tr>
diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index 265e933c0912..f0a1ed352fdf 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -1073,7 +1073,7 @@ Python UDF registration is unchanged.
 Spark SQL is designed to be compatible with the Hive Metastore, SerDes and 
UDFs.
 Currently, Hive SerDes and UDFs are based on built-in Hive,
 and Spark SQL can be connected to different versions of Hive Metastore
-(from 2.0.0 to 2.3.10 and 3.0.0 to 4.1.0. Also see [Interacting with Different 
Versions of Hive 
Metastore](sql-data-sources-hive-tables.html#interacting-with-different-versions-of-hive-metastore).
+(from 2.0.0 to 2.3.10 and 3.0.0 to 4.2.0). Also see [Interacting with 
Different Versions of Hive 
Metastore](sql-data-sources-hive-tables.html#interacting-with-different-versions-of-hive-metastore).
 
 #### Deploying in Existing Hive Warehouses
 {:.no_toc}
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
index 48edb6e38126..5caf1914c04b 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
@@ -1727,6 +1727,18 @@ private[sql] object QueryExecutionErrors extends 
QueryErrorsBase with ExecutionE
         "key" -> key))
   }
 
+  def unsupportedHiveMetastoreVersionForJavaError(
+      version: String,
+      requiredJavaVersion: Int,
+      currentJavaVersion: Int): SparkUnsupportedOperationException = {
+    new SparkUnsupportedOperationException(
+      errorClass = "UNSUPPORTED_HIVE_METASTORE_VERSION_FOR_JAVA",
+      messageParameters = Map(
+        "version" -> version,
+        "requiredJavaVersion" -> requiredJavaVersion.toString,
+        "currentJavaVersion" -> currentJavaVersion.toString))
+  }
+
   def loadHiveClientCausesNoClassDefFoundError(
       cnf: NoClassDefFoundError,
       execJars: Seq[URL],
diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
index 4028da153ff9..d4452ae83fa7 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
@@ -77,7 +77,8 @@ private[spark] object HiveUtils extends Logging {
     .doc("Version of the Hive metastore. Available options are " +
       "<code>2.0.0</code> through <code>2.3.10</code>, " +
       "<code>3.0.0</code> through <code>3.1.3</code> and " +
-      "<code>4.0.0</code> through <code>4.1.0</code>.")
+      "<code>4.0.0</code> through <code>4.2.0</code>. " +
+      "Note: Hive 4.2 requires Java 21 or later.")
     .version("1.4.0")
     .stringConf
     .checkValue(isCompatibleHiveVersion, "Unsupported Hive Metastore version")
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
index 21db79116b52..dc6c65facadb 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
@@ -129,6 +129,7 @@ private[hive] class HiveClientImpl(
     case hive.v3_1 => new Shim_v3_1()
     case hive.v4_0 => new Shim_v4_0()
     case hive.v4_1 => new Shim_v4_1()
+    case hive.v4_2 => new Shim_v4_2()
   }
 
   // Create an internal session state for this HiveClientImpl.
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
index ef27669f5ba0..32d892883697 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
@@ -1545,3 +1545,5 @@ private[client] class Shim_v4_0 extends Shim_v3_1 {
 }
 
 private[client] class Shim_v4_1 extends Shim_v4_0
+
+private[client] class Shim_v4_2 extends Shim_v4_1
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
index eb649d196ff6..fbe8be59ae51 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
@@ -102,6 +102,7 @@ private[hive] object IsolatedClientLoader extends Logging {
       case (3, 1, _) => Some(hive.v3_1)
       case (4, 0, _) => Some(hive.v4_0)
       case (4, 1, _) => Some(hive.v4_1)
+      case (4, 2, _) => Some(hive.v4_2)
       case _ => None
     }.getOrElse {
       throw QueryExecutionErrors.unsupportedHiveMetastoreVersionError(
@@ -206,6 +207,14 @@ private[hive] class IsolatedClientLoader(
     val barrierPrefixes: Seq[String] = Seq.empty)
   extends Logging {
 
+  // Hive 4.2 requires Java 21 or later. The guard lives on the 
client-construction path rather
+  // than in IsolatedClientLoader.hiveVersion, which is also used for 
version-string validation,
+  // so the actionable message reaches the user instead of being swallowed.
+  if (version == hive.v4_2 && !Utils.isJavaVersionAtLeast21) {
+    throw QueryExecutionErrors.unsupportedHiveMetastoreVersionForJavaError(
+      version.fullVersion, 21, Runtime.version().feature())
+  }
+
   /**
    * This controls whether the generated clients maintain an 
independent/isolated copy of the
    * Hive `SessionState`. If false, the Hive will leverage the global/static 
copy of
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/package.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/package.scala
index 24ccbc7cbac4..3fef6dc5593c 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/package.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/package.scala
@@ -129,8 +129,38 @@ package object client {
           }
         })
 
+    // Hive 4.2 ships with datanucleus-api-jdo:6.0.3 (down from 6.0.5), 
datanucleus-core:6.0.10
+    // (down from 6.0.11), and javax.jdo:3.2.0-release (down from 3.2.1) 
relative to Hive 4.1.
+    // These reflect the actual Hive 4.2 POM and must not be "upgraded" to 
v4_1 values.
+    // Derby was bumped to 10.17.1.0 (from 10.14.1.0 in v4_1) for Java 21 
compatibility.
+    case object v4_2 extends HiveVersion("4.2.0",
+      extraDeps =
+        "org.antlr:antlr4-runtime:4.9.3" ::
+        "org.apache.derby:derby:10.17.1.0" ::
+        "org.apache.hadoop:hadoop-hdfs:3.4.1" ::
+        "org.datanucleus:datanucleus-api-jdo:6.0.3" ::
+        "org.datanucleus:datanucleus-core:6.0.10" ::
+        "org.datanucleus:datanucleus-rdbms:6.0.10" ::
+        "org.datanucleus:javax.jdo:3.2.0-release" ::
+        "org.springframework:spring-core:5.3.39" ::
+        "org.springframework:spring-jdbc:5.3.39" :: Nil,
+      exclusions =
+        "org.apache.curator:*" ::
+        "org.apache.hive:hive-service-rpc" ::
+        "org.apache.zookeeper:zookeeper" :: Nil ++
+        {
+          if (!Utils.isTesting) {
+            // HiveClientImpl#runHive which is used for testing refers
+            // `org.apache.hadoop.hive.ql.DriverContext` indirectly and 
`DriverContext` refers
+            // Tez APIs.
+            Seq("org.apache.tez:tez-api")
+          } else {
+            Seq.empty
+          }
+        })
+
     val allSupportedHiveVersions: Set[HiveVersion] =
-      Set(v2_0, v2_1, v2_2, v2_3, v3_0, v3_1, v4_0, v4_1)
+      Set(v2_0, v2_1, v2_2, v2_3, v3_0, v3_1, v4_0, v4_1, v4_2)
   }
   // scalastyle:on
 
diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientVersions.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientVersions.scala
index c06e2dea40f9..0ea4834839e5 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientVersions.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientVersions.scala
@@ -17,11 +17,21 @@
 
 package org.apache.spark.sql.hive.client
 
+import org.apache.spark.util.Utils
+
 private[client] trait HiveClientVersions {
   private val testVersions = sys.env.get("SPARK_TEST_HIVE_CLIENT_VERSIONS")
-  protected val versions = if (testVersions.nonEmpty) {
+  private val allVersions = if (testVersions.nonEmpty) {
     testVersions.get.split(",").map(_.trim).filter(_.nonEmpty).toIndexedSeq
   } else {
-    IndexedSeq("2.0", "2.1", "2.2", "2.3", "3.0", "3.1", "4.0", "4.1")
+    IndexedSeq("2.0", "2.1", "2.2", "2.3", "3.0", "3.1", "4.0", "4.1", "4.2")
+  }
+
+  protected val versions: IndexedSeq[String] = {
+    if (Utils.isJavaVersionAtLeast21) {
+      allVersions
+    } else {
+      allVersions.filterNot(_ == "4.2")
+    }
   }
 }


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to