Re: [PR] feat: Add CometRowToColumnar operator [arrow-datafusion-comet]

via GitHub Mon, 08 Apr 2024 00:35:48 -0700


viirya commented on code in PR #206:
URL: 
https://github.com/apache/arrow-datafusion-comet/pull/206#discussion_r1555352583



##########
spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala:
##########
@@ -574,18 +580,26 @@ class CometSparkSessionExtensions
     }
   }
 
-  // CometExec already wraps a `ColumnarToRowExec` for row-based operators. 
Therefore,
-  // `ColumnarToRowExec` is redundant and can be eliminated.
+  // This rule is responsible for eliminating redundant transitions between 
row-based and
+  // columnar-based operators for Comet. Currently, two potential redundant 
transitions are:
+  // 1. ColumnarToRowExec at the end of a Spark operator, which is redundant 
for Comet operators as
+  //    CometExec already wraps a `ColumnarToRowExec` for row-based operators.
+  // 2. Consecutive operators of CometRowToColumnarExec and ColumnarToRowExec, 
which might be
+  //    possible for Comet to add a `CometRowToColumnarExec` for row-based 
operators first, then
+  //    Spark only requests row-based output.

Review Comment:
   Do you actually mean:
   
   ```
   Comet adds `CometRowToColumnarExec` on top of row-based data scan operators, 
but the
    downstream operator is Spark operator which takes row-based input. So Spark 
adds another 
   `ColumnarToRowExec` after `CometRowToColumnarExec`. In this case, we remove 
the pair of
   `CometRowToColumnarExec` and `ColumnarToRowExec`.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat: Add CometRowToColumnar operator [arrow-datafusion-comet]

Reply via email to