This is an automated email from the ASF dual-hosted git repository.
LuciferYang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new b2580fc795ba [SPARK-57263][SQL] Support Hive 4.2 metastore
b2580fc795ba is described below
commit b2580fc795ba6b9d36f414e28a670501b6d8077f
Author: YangJie <[email protected]>
AuthorDate: Fri Jun 5 14:24:58 2026 +0800
[SPARK-57263][SQL] Support Hive 4.2 metastore
### What changes were proposed in this pull request?
This PR adds Hive `4.2.0` as a supported metastore client version,
following 4.0 (SPARK-45265) and 4.1 (SPARK-53095).
- Add `hive.v4_2` with the `extraDeps` taken from the Hive 4.2 POM. A few
datanucleus/jdo deps are actually lower than 4.1 (`datanucleus-api-jdo` 6.0.3
vs 6.0.5, `datanucleus-core` 6.0.10 vs 6.0.11, `javax.jdo` 3.2.0 vs 3.2.1),
while Derby is bumped to `10.17.1.0` for Java 21. There is a note in
`package.scala` so these don't get "fixed" upward later.
- `Shim_v4_2` extends `Shim_v4_1`. The shimmed method signatures are
unchanged between 4.1 and 4.2, so the body is empty.
- Hive 4.2 is compiled with `maven.compiler.target=21`, so its jars cannot
load on an older JVM. When a 4.2 client is constructed on Java < 21, it now
fails with `UNSUPPORTED_HIVE_METASTORE_VERSION_FOR_JAVA` instead of a raw
`UnsupportedClassVersionError`. The check lives on the client-construction path
rather than in `hiveVersion()`, so config validation still resolves `4.2.0`
normally.
- `HiveClientVersions` includes `4.2` in the test sweep only on Java 21+.
- Update the supported metastore version range in the docs.
### Why are the changes needed?
Hive 4.2.0 is released and supports JDK 21. Users on Java 21 should be able
to connect Spark to a Hive 4.2 metastore via
`spark.sql.hive.metastore.version=4.2.0`.
### Does this PR introduce _any_ user-facing change?
Yes. `4.2.0` is now a valid value for `spark.sql.hive.metastore.version`
(Java 21+). On an older JVM, setting it fails fast with a clear message.
### How was this patch tested?
- Adding `4.2` to `HiveClientVersions` runs it through `HiveClientSuites`,
which loads the client via the isolated classloader (requires network access to
download the 4.2 jars).
- Checked the shimmed methods in `Hive.java` and `IMetaStoreClient.java`
against Hive 4.2 source; no differences from 4.1.
- `build/sbt 'core/testOnly *SparkThrowableSuite'` for the new error
condition.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #56337 from LuciferYang/worktree-SPARK-hive42-metastore.
Authored-by: YangJie <[email protected]>
Signed-off-by: yangjie01 <[email protected]>
---
.../src/main/resources/error/error-conditions.json | 6 ++++
docs/sql-data-sources-hive-tables.md | 3 +-
docs/sql-migration-guide.md | 2 +-
.../spark/sql/errors/QueryExecutionErrors.scala | 12 ++++++++
.../org/apache/spark/sql/hive/HiveUtils.scala | 3 +-
.../spark/sql/hive/client/HiveClientImpl.scala | 1 +
.../apache/spark/sql/hive/client/HiveShim.scala | 2 ++
.../sql/hive/client/IsolatedClientLoader.scala | 9 ++++++
.../org/apache/spark/sql/hive/client/package.scala | 32 +++++++++++++++++++++-
.../spark/sql/hive/client/HiveClientVersions.scala | 14 ++++++++--
10 files changed, 78 insertions(+), 6 deletions(-)
diff --git a/common/utils/src/main/resources/error/error-conditions.json
b/common/utils/src/main/resources/error/error-conditions.json
index 146db79e4cc6..4e66798d4039 100644
--- a/common/utils/src/main/resources/error/error-conditions.json
+++ b/common/utils/src/main/resources/error/error-conditions.json
@@ -8453,6 +8453,12 @@
],
"sqlState" : "42K0E"
},
+ "UNSUPPORTED_HIVE_METASTORE_VERSION_FOR_JAVA" : {
+ "message" : [
+ "Hive metastore version <version> requires Java <requiredJavaVersion> or
later, but the current JVM is Java <currentJavaVersion>. Please upgrade your
Java version or use an earlier Hive metastore version."
+ ],
+ "sqlState" : "0A000"
+ },
"UNSUPPORTED_INSERT" : {
"message" : [
"Can't insert into the target."
diff --git a/docs/sql-data-sources-hive-tables.md
b/docs/sql-data-sources-hive-tables.md
index 977efa8f2433..f4587ee41eaa 100644
--- a/docs/sql-data-sources-hive-tables.md
+++ b/docs/sql-data-sources-hive-tables.md
@@ -130,7 +130,8 @@ The following options can be used to configure the version
of Hive that is used
<td><code>2.3.10</code></td>
<td>
Version of the Hive metastore. Available
- options are <code>2.0.0</code> through <code>2.3.10</code>,
<code>3.0.0</code> through <code>3.1.3</code>, and <code>4.0.0</code> through
<code>4.1.0</code>.
+ options are <code>2.0.0</code> through <code>2.3.10</code>,
<code>3.0.0</code> through <code>3.1.3</code>, and <code>4.0.0</code> through
<code>4.2.0</code>.
+ Note: Hive 4.2 requires Java 21 or later.
</td>
<td>1.4.0</td>
</tr>
diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index 265e933c0912..f0a1ed352fdf 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -1073,7 +1073,7 @@ Python UDF registration is unchanged.
Spark SQL is designed to be compatible with the Hive Metastore, SerDes and
UDFs.
Currently, Hive SerDes and UDFs are based on built-in Hive,
and Spark SQL can be connected to different versions of Hive Metastore
-(from 2.0.0 to 2.3.10 and 3.0.0 to 4.1.0. Also see [Interacting with Different
Versions of Hive
Metastore](sql-data-sources-hive-tables.html#interacting-with-different-versions-of-hive-metastore).
+(from 2.0.0 to 2.3.10 and 3.0.0 to 4.2.0). Also see [Interacting with
Different Versions of Hive
Metastore](sql-data-sources-hive-tables.html#interacting-with-different-versions-of-hive-metastore).
#### Deploying in Existing Hive Warehouses
{:.no_toc}
diff --git
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
index 48edb6e38126..5caf1914c04b 100644
---
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
+++
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
@@ -1727,6 +1727,18 @@ private[sql] object QueryExecutionErrors extends
QueryErrorsBase with ExecutionE
"key" -> key))
}
+ def unsupportedHiveMetastoreVersionForJavaError(
+ version: String,
+ requiredJavaVersion: Int,
+ currentJavaVersion: Int): SparkUnsupportedOperationException = {
+ new SparkUnsupportedOperationException(
+ errorClass = "UNSUPPORTED_HIVE_METASTORE_VERSION_FOR_JAVA",
+ messageParameters = Map(
+ "version" -> version,
+ "requiredJavaVersion" -> requiredJavaVersion.toString,
+ "currentJavaVersion" -> currentJavaVersion.toString))
+ }
+
def loadHiveClientCausesNoClassDefFoundError(
cnf: NoClassDefFoundError,
execJars: Seq[URL],
diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
index 4028da153ff9..d4452ae83fa7 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
@@ -77,7 +77,8 @@ private[spark] object HiveUtils extends Logging {
.doc("Version of the Hive metastore. Available options are " +
"<code>2.0.0</code> through <code>2.3.10</code>, " +
"<code>3.0.0</code> through <code>3.1.3</code> and " +
- "<code>4.0.0</code> through <code>4.1.0</code>.")
+ "<code>4.0.0</code> through <code>4.2.0</code>. " +
+ "Note: Hive 4.2 requires Java 21 or later.")
.version("1.4.0")
.stringConf
.checkValue(isCompatibleHiveVersion, "Unsupported Hive Metastore version")
diff --git
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
index 21db79116b52..dc6c65facadb 100644
---
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
+++
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
@@ -129,6 +129,7 @@ private[hive] class HiveClientImpl(
case hive.v3_1 => new Shim_v3_1()
case hive.v4_0 => new Shim_v4_0()
case hive.v4_1 => new Shim_v4_1()
+ case hive.v4_2 => new Shim_v4_2()
}
// Create an internal session state for this HiveClientImpl.
diff --git
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
index ef27669f5ba0..32d892883697 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
@@ -1545,3 +1545,5 @@ private[client] class Shim_v4_0 extends Shim_v3_1 {
}
private[client] class Shim_v4_1 extends Shim_v4_0
+
+private[client] class Shim_v4_2 extends Shim_v4_1
diff --git
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
index eb649d196ff6..fbe8be59ae51 100644
---
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
+++
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
@@ -102,6 +102,7 @@ private[hive] object IsolatedClientLoader extends Logging {
case (3, 1, _) => Some(hive.v3_1)
case (4, 0, _) => Some(hive.v4_0)
case (4, 1, _) => Some(hive.v4_1)
+ case (4, 2, _) => Some(hive.v4_2)
case _ => None
}.getOrElse {
throw QueryExecutionErrors.unsupportedHiveMetastoreVersionError(
@@ -206,6 +207,14 @@ private[hive] class IsolatedClientLoader(
val barrierPrefixes: Seq[String] = Seq.empty)
extends Logging {
+ // Hive 4.2 requires Java 21 or later. The guard lives on the
client-construction path rather
+ // than in IsolatedClientLoader.hiveVersion, which is also used for
version-string validation,
+ // so the actionable message reaches the user instead of being swallowed.
+ if (version == hive.v4_2 && !Utils.isJavaVersionAtLeast21) {
+ throw QueryExecutionErrors.unsupportedHiveMetastoreVersionForJavaError(
+ version.fullVersion, 21, Runtime.version().feature())
+ }
+
/**
* This controls whether the generated clients maintain an
independent/isolated copy of the
* Hive `SessionState`. If false, the Hive will leverage the global/static
copy of
diff --git
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/package.scala
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/package.scala
index 24ccbc7cbac4..3fef6dc5593c 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/package.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/package.scala
@@ -129,8 +129,38 @@ package object client {
}
})
+ // Hive 4.2 ships with datanucleus-api-jdo:6.0.3 (down from 6.0.5),
datanucleus-core:6.0.10
+ // (down from 6.0.11), and javax.jdo:3.2.0-release (down from 3.2.1)
relative to Hive 4.1.
+ // These reflect the actual Hive 4.2 POM and must not be "upgraded" to
v4_1 values.
+ // Derby was bumped to 10.17.1.0 (from 10.14.1.0 in v4_1) for Java 21
compatibility.
+ case object v4_2 extends HiveVersion("4.2.0",
+ extraDeps =
+ "org.antlr:antlr4-runtime:4.9.3" ::
+ "org.apache.derby:derby:10.17.1.0" ::
+ "org.apache.hadoop:hadoop-hdfs:3.4.1" ::
+ "org.datanucleus:datanucleus-api-jdo:6.0.3" ::
+ "org.datanucleus:datanucleus-core:6.0.10" ::
+ "org.datanucleus:datanucleus-rdbms:6.0.10" ::
+ "org.datanucleus:javax.jdo:3.2.0-release" ::
+ "org.springframework:spring-core:5.3.39" ::
+ "org.springframework:spring-jdbc:5.3.39" :: Nil,
+ exclusions =
+ "org.apache.curator:*" ::
+ "org.apache.hive:hive-service-rpc" ::
+ "org.apache.zookeeper:zookeeper" :: Nil ++
+ {
+ if (!Utils.isTesting) {
+ // HiveClientImpl#runHive which is used for testing refers
+ // `org.apache.hadoop.hive.ql.DriverContext` indirectly and
`DriverContext` refers
+ // Tez APIs.
+ Seq("org.apache.tez:tez-api")
+ } else {
+ Seq.empty
+ }
+ })
+
val allSupportedHiveVersions: Set[HiveVersion] =
- Set(v2_0, v2_1, v2_2, v2_3, v3_0, v3_1, v4_0, v4_1)
+ Set(v2_0, v2_1, v2_2, v2_3, v3_0, v3_1, v4_0, v4_1, v4_2)
}
// scalastyle:on
diff --git
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientVersions.scala
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientVersions.scala
index c06e2dea40f9..0ea4834839e5 100644
---
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientVersions.scala
+++
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientVersions.scala
@@ -17,11 +17,21 @@
package org.apache.spark.sql.hive.client
+import org.apache.spark.util.Utils
+
private[client] trait HiveClientVersions {
private val testVersions = sys.env.get("SPARK_TEST_HIVE_CLIENT_VERSIONS")
- protected val versions = if (testVersions.nonEmpty) {
+ private val allVersions = if (testVersions.nonEmpty) {
testVersions.get.split(",").map(_.trim).filter(_.nonEmpty).toIndexedSeq
} else {
- IndexedSeq("2.0", "2.1", "2.2", "2.3", "3.0", "3.1", "4.0", "4.1")
+ IndexedSeq("2.0", "2.1", "2.2", "2.3", "3.0", "3.1", "4.0", "4.1", "4.2")
+ }
+
+ protected val versions: IndexedSeq[String] = {
+ if (Utils.isJavaVersionAtLeast21) {
+ allVersions
+ } else {
+ allVersions.filterNot(_ == "4.2")
+ }
}
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]