andygrove opened a new issue, #757:
URL: https://github.com/apache/datafusion-comet/issues/757

   ### What is the problem the feature request solves?
   
   Because `FilterExec` can sometimes return its input vectors without copying 
them (in the case where the predicate evaluates to true for all rows in the 
batch), we have to wrap this exec in a `CopyExec` when using this as the input 
to a join:
   
   ```rust
   // DataFusion Join operators keep the input batch internally. We need
   // to copy the input batch to avoid the data corruption from reusing the 
input
   // batch.
   let left = if can_reuse_input_batch(&left) {
       Arc::new(CopyExec::new(left))
   } else {
       left
   };
   ```
   
   In the case where the filter does not select all rows in the batch, it will 
make a copy of the selected rows, and then we copy them again in `CopyExec`. 
Perhaps we could avoid this redundant copy.
   
   
   ### Describe the potential solution
   
   One idea would be to modify `FilterExec` to add some metadata to the 
returned batch to indicate whether it is returning any original vectors and 
then have `CopyExec` avoid a copy when this metadata is set.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to