andygrove opened a new issue, #757: URL: https://github.com/apache/datafusion-comet/issues/757
### What is the problem the feature request solves? Because `FilterExec` can sometimes return its input vectors without copying them (in the case where the predicate evaluates to true for all rows in the batch), we have to wrap this exec in a `CopyExec` when using this as the input to a join: ```rust // DataFusion Join operators keep the input batch internally. We need // to copy the input batch to avoid the data corruption from reusing the input // batch. let left = if can_reuse_input_batch(&left) { Arc::new(CopyExec::new(left)) } else { left }; ``` In the case where the filter does not select all rows in the batch, it will make a copy of the selected rows, and then we copy them again in `CopyExec`. Perhaps we could avoid this redundant copy. ### Describe the potential solution One idea would be to modify `FilterExec` to add some metadata to the returned batch to indicate whether it is returning any original vectors and then have `CopyExec` avoid a copy when this metadata is set. ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org