advancedxy commented on code in PR #206:
URL:
https://github.com/apache/arrow-datafusion-comet/pull/206#discussion_r1555973542
##########
spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala:
##########
@@ -574,18 +580,26 @@ class CometSparkSessionExtensions
}
}
- // CometExec already wraps a `ColumnarToRowExec` for row-based operators.
Therefore,
- // `ColumnarToRowExec` is redundant and can be eliminated.
+ // This rule is responsible for eliminating redundant transitions between
row-based and
+ // columnar-based operators for Comet. Currently, two potential redundant
transitions are:
+ // 1. ColumnarToRowExec at the end of a Spark operator, which is redundant
for Comet operators as
+ // CometExec already wraps a `ColumnarToRowExec` for row-based operators.
+ // 2. Consecutive operators of CometRowToColumnarExec and ColumnarToRowExec,
which might be
+ // possible for Comet to add a `CometRowToColumnarExec` for row-based
operators first, then
+ // Spark only requests row-based output.
Review Comment:
> but the
downstream operator is Spark operator which takes row-based input
hmm, this is another possibility, let me update the comment to include this
one.
The case I described above is that Spark only requests row-based at the end
of the operator, the row-based requirement might be passed down to the
`CometRowToColumnarExec` and then we have a pair of `CometRowToColumnarExec`
and `ColumnarToRowExec`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]