(spark) branch master updated: [SPARK-52484][SQL] Skip child.supportsColumnar assertion from driver side in ColumnarToRowExec

yao Tue, 17 Jun 2025 20:33:19 -0700

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 722d02ccbda7 [SPARK-52484][SQL] Skip child.supportsColumnar assertion 
from driver side in ColumnarToRowExec
722d02ccbda7 is described below

commit 722d02ccbda7157226a389c8e6fcceef4978c1e6
Author: Hongze Zhang <hongze.zzz...@gmail.com>
AuthorDate: Wed Jun 18 11:33:02 2025 +0800

    [SPARK-52484][SQL] Skip child.supportsColumnar assertion from driver side 
in ColumnarToRowExec
    
    https://issues.apache.org/jira/browse/SPARK-52484
    
    ### What changes were proposed in this pull request?
    
    The PR removes the unnecessary assertion in `ColumnarToRowExec` introduced 
by https://github.com/apache/spark/pull/25264 to guarantee some flexibilities 
for 3rd Spark plugins. Especially in Apache Gluten, the assertion blocks some 
of our effort in query optimization because we needed an intermediate state of 
the query plan which Spark may see as illegal.
    
    Moreover, some typical reasons this intermediate state is needed in Gluten 
are:
    
    1. Gluten has a cost evaluator API to evaluate the cost of a `transition 
rule` (which adds a unary node on top of an input plan). In the case Gluten 
will need a fake leaf to let the rule apply on it for cost evaluation. This 
leaf node has to be made a columnar one to bypass this assertion, which is a 
bit hacky.
    2. Gluten has a cascades-style query optimizer (RAS) which could set a 
leaf, dummy, row-based plan node to hide up a child-tree of a brach query plan 
node, during which this leaf is to represent a so-called cascades 'group'. 
Although this pattern (C2R on a row-based plan) is illegal, it could still be 
used as the input of an optimizer rule to potentially be matched on and then to 
be converted into a valid query plan.
    
    This PR is to remove the assertion to ensure some flexibilities to the 3rd 
plugins. This should be no harm for the upstream Apache Spark, because the 
query execution will still be failed by [this 
error](https://github.com/apache/spark/blob/5d0b2f41794bf4dd25b3ce19bc4f634082b40876/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala#L343-L351)
 without this assertion on an illegal query plan.
    
    Some workarounds used by Gluten for bypassing this assertion:
    
    1. 
https://github.com/apache/incubator-gluten/blob/0a1b5c28678653242ab0fd7b28ebba1dca43ccb1/gluten-core/src/main/scala/org/apache/gluten/extension/columnar/transition/package.scala#L83
    2. 
https://github.com/apache/incubator-gluten/blob/0a1b5c28678653242ab0fd7b28ebba1dca43ccb1/gluten-core/src/main/scala/org/apache/gluten/extension/columnar/enumerated/planner/plan/GlutenPlanModel.scala#L51-L55
    
    Once the assertion is removed, Gluten will be able to remove these 
workarounds to simply code.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Basically no. An assertion error in plan-building time will be replaced by 
an exception in execution time (still from the driver side) when an illegal 
query plan is generated.
    
    ### How was this patch tested?
    
    Existing UTs.
    
    Closes #51183 from zhztheplayer/wip-rm-c2r-check.
    
    Authored-by: Hongze Zhang <hongze.zzz...@gmail.com>
    Signed-off-by: Kent Yao <y...@apache.org>
---
 sql/core/src/main/scala/org/apache/spark/sql/execution/Columnar.scala | 4 ----
 1 file changed, 4 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/Columnar.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/Columnar.scala
index 4c9ae155ec17..85ed1a24fc6e 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/Columnar.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/Columnar.scala
@@ -32,7 +32,6 @@ import org.apache.spark.sql.execution.metric.{SQLMetric, 
SQLMetrics}
 import org.apache.spark.sql.execution.vectorized.WritableColumnVector
 import org.apache.spark.sql.types._
 import org.apache.spark.sql.vectorized.{ColumnarBatch, ColumnVector}
-import org.apache.spark.util.Utils
 
 /**
  * Holds a user defined rule that can be used to inject columnar 
implementations of various
@@ -66,9 +65,6 @@ trait ColumnarToRowTransition extends UnaryExecNode
  * [[MapPartitionsInRWithArrowExec]]. Eventually this should replace those 
implementations.
  */
 case class ColumnarToRowExec(child: SparkPlan) extends ColumnarToRowTransition 
with CodegenSupport {
-  // supportsColumnar requires to be only called on driver side, see also 
SPARK-37779.
-  assert(Utils.isInRunningSparkTask || child.supportsColumnar)
-
   override def output: Seq[Attribute] = child.output
 
   override def outputPartitioning: Partitioning = child.outputPartitioning


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-52484][SQL] Skip child.supportsColumnar assertion from driver side in ColumnarToRowExec

Reply via email to