advancedxy commented on code in PR #100: URL: https://github.com/apache/arrow-datafusion-comet/pull/100#discussion_r1504400681
########## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ########## @@ -451,6 +468,23 @@ class CometSparkSessionExtensions } } } + + // CometExec already wraps a `ColumnarToRowExec` for row-based operators. Therefore, + // `ColumnarToRowExec` is redundant and can be eliminated. + // + // It was added during ApplyColumnarRulesAndInsertTransitions' insertTransitions phase when Spark + // requests row-based output such as `collect` call. It's correct to add a redundant + // `ColumnarToRowExec` for `CometExec`. However, for certain operators such as + // `CometCollectLimitExec` which overrides `executeCollect`, the redundant `ColumnarToRowExec` + // makes the override ineffective. The purpose of this rule is to eliminate the redundant + // `ColumnarToRowExec` for such operators. + case class EliminateRedundantColumnarToRow(session: SparkSession) extends Rule[SparkPlan] { Review Comment: I added an assert in the test file, which should illustrate the basic idea. ```scala // make sure the root node is CometCollectLimitExec assert(qe.executedPlan.isInstanceOf[CometCollectLimitExec]) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org