Re: [PR] feat: Add CometRowToColumnar operator [arrow-datafusion-comet]

via GitHub Mon, 08 Apr 2024 07:49:09 -0700


advancedxy commented on code in PR #206:
URL: 
https://github.com/apache/arrow-datafusion-comet/pull/206#discussion_r1555973542



##########
spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala:
##########
@@ -574,18 +580,26 @@ class CometSparkSessionExtensions
     }
   }
 
-  // CometExec already wraps a `ColumnarToRowExec` for row-based operators. 
Therefore,
-  // `ColumnarToRowExec` is redundant and can be eliminated.
+  // This rule is responsible for eliminating redundant transitions between 
row-based and
+  // columnar-based operators for Comet. Currently, two potential redundant 
transitions are:
+  // 1. ColumnarToRowExec at the end of a Spark operator, which is redundant 
for Comet operators as
+  //    CometExec already wraps a `ColumnarToRowExec` for row-based operators.
+  // 2. Consecutive operators of CometRowToColumnarExec and ColumnarToRowExec, 
which might be
+  //    possible for Comet to add a `CometRowToColumnarExec` for row-based 
operators first, then
+  //    Spark only requests row-based output.

Review Comment:
   > but the
    downstream operator is Spark operator which takes row-based input
   
   
   hmm, this is another possibility, let me update the comment to include this 
one.
   The case I described above is that Spark only requests row-based at the end 
of the operator, the row-based requirement might be passed down to the 
`CometRowToColumnarExec` and then we have a pair of `CometRowToColumnarExec` 
and `ColumnarToRowExec`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat: Add CometRowToColumnar operator [arrow-datafusion-comet]

Reply via email to