viirya commented on code in PR #206:
URL:
https://github.com/apache/arrow-datafusion-comet/pull/206#discussion_r1555352583
##########
spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala:
##########
@@ -574,18 +580,26 @@ class CometSparkSessionExtensions
}
}
- // CometExec already wraps a `ColumnarToRowExec` for row-based operators.
Therefore,
- // `ColumnarToRowExec` is redundant and can be eliminated.
+ // This rule is responsible for eliminating redundant transitions between
row-based and
+ // columnar-based operators for Comet. Currently, two potential redundant
transitions are:
+ // 1. ColumnarToRowExec at the end of a Spark operator, which is redundant
for Comet operators as
+ // CometExec already wraps a `ColumnarToRowExec` for row-based operators.
+ // 2. Consecutive operators of CometRowToColumnarExec and ColumnarToRowExec,
which might be
+ // possible for Comet to add a `CometRowToColumnarExec` for row-based
operators first, then
+ // Spark only requests row-based output.
Review Comment:
Do you actually mean:
```
Comet adds `CometRowToColumnarExec` on top of row-based data scan operators,
but the
downstream operator is Spark operator which takes row-based input. So Spark
adds another
`ColumnarToRowExec` after `CometRowToColumnarExec`. In this case, we remove
the pair of
`CometRowToColumnarExec` and `ColumnarToRowExec`.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]