(incubator-gluten) branch main updated: [GLUTEN-10103][VL] Fall back to vanilla spark when UnresolvedException occurs in the schema validation (#10138)

kejia Tue, 15 Jul 2025 03:40:14 -0700

This is an automated email from the ASF dual-hosted git repository.

kejia pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-gluten.git



The following commit(s) were added to refs/heads/main by this push:
     new f8bc285520 [GLUTEN-10103][VL] Fall back to vanilla spark when 
UnresolvedException occurs in the schema validation (#10138)
f8bc285520 is described below

commit f8bc285520b223897d1eea9d5d5f72bfece6b220
Author: JiaKe <[email protected]>
AuthorDate: Tue Jul 15 18:40:06 2025 +0800

    [GLUTEN-10103][VL] Fall back to vanilla spark when UnresolvedException 
occurs in the schema validation (#10138)
---
 docs/velox-backend-limitations.md                  |  7 ++++++
 .../apache/gluten/execution/ValidatablePlan.scala  | 26 +++++++++++++++++-----
 2 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/docs/velox-backend-limitations.md 
b/docs/velox-backend-limitations.md
index 161b1bfc71..a2b76456d5 100644
--- a/docs/velox-backend-limitations.md
+++ b/docs/velox-backend-limitations.md
@@ -163,3 +163,10 @@ Gluten's.
 ### CSV Read
 The header option should be true. And now we only support DatasourceV1, i.e., 
user should set `spark.sql.sources.useV1SourceList=csv`. User defined read 
option is not supported, which will make CSV read fall back to vanilla Spark in 
most case.
 CSV read will also fall back to vanilla Spark and log warning when user 
specifies schema is different with file schema.
+
+### Utilizing Map Type as Hash Keys in ColumnarShuffleExchange
+Spark uses the `spark.sql.legacy.allowHashOnMapType` configuration to support 
hash map key functions. 
+Gluten enables this configuration during the creation of 
ColumnarShuffleExchange, as shown in the code 
[link](https://github.com/apache/incubator-gluten/blob/0dacac84d3bf3d2759a5dd7e0735147852d2845d/backends-velox/src/main/scala/org/apache/gluten/backendsapi/velox/VeloxSparkPlanExecApi.scala#L355-L363).
 
+This method bypasses Spark's unresolved checks and creates projects with the 
hash(mapType) operator before ColumnarShuffleExchange. 
+However, if `spark.sql.legacy.allowHashOnMapType` is disabled in a test 
environment, projects using the hash(mapType) expression may throw an `Invalid 
call to dataType on unresolved object` exception during validation, causing 
them to fallback to vanilla Spark, as referenced in the code 
[link](https://github.com/apache/spark/blob/de5fa426e23b84fc3c2bddeabcd2e1eda515abd5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala#L291-L296).
+ Enabling this configuration allows the project to be offloaded to Velox.
diff --git 
a/gluten-substrait/src/main/scala/org/apache/gluten/execution/ValidatablePlan.scala
 
b/gluten-substrait/src/main/scala/org/apache/gluten/execution/ValidatablePlan.scala
index 40a403327f..53cf968bd9 100644
--- 
a/gluten-substrait/src/main/scala/org/apache/gluten/execution/ValidatablePlan.scala
+++ 
b/gluten-substrait/src/main/scala/org/apache/gluten/execution/ValidatablePlan.scala
@@ -23,6 +23,8 @@ import org.apache.gluten.expression.TransformerState
 import org.apache.gluten.logging.LogLevelUtil
 import org.apache.gluten.test.TestStats
 
+import org.apache.spark.sql.catalyst.analysis.UnresolvedException
+
 /**
  * Base interface for a Gluten query plan that is also open to validation 
calls.
  *
@@ -63,13 +65,25 @@ trait ValidatablePlan extends GlutenPlan with LogLevelUtil {
    * Validate whether this SparkPlan supports to be transformed into substrait 
node in Native Code.
    */
   final def doValidate(): ValidationResult = {
-    val schemaValidationResult = BackendsApiManager.getValidatorApiInstance
-      .doSchemaValidate(schema)
-      .map {
-        reason =>
-          ValidationResult.failed(s"Found schema check failure for $schema, 
due to: $reason")
+    val schemaValidationResult =
+      try {
+        BackendsApiManager.getValidatorApiInstance
+          .doSchemaValidate(schema)
+          .map {
+            reason =>
+              ValidationResult.failed(s"Found schema check failure for 
$schema, due to: $reason")
+          }
+          .getOrElse(ValidationResult.succeeded)
+      } catch {
+        case u: UnresolvedException =>
+          val message =
+            s"Failed to retrieve schema, due to: ${u.getMessage}." +
+              s" If you are using a hash expression with a map key," +
+              s" consider enabling the spark.sql.legacy.allowHashOnMapType " +
+              s"setting to resolve this issue."
+          ValidationResult.failed(message)
       }
-      .getOrElse(ValidationResult.succeeded)
+
     if (!schemaValidationResult.ok()) {
       TestStats.addFallBackClassName(this.getClass.toString)
       if (validationFailFast) {


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(incubator-gluten) branch main updated: [GLUTEN-10103][VL] Fall back to vanilla spark when UnresolvedException occurs in the schema validation (#10138)

Reply via email to