andygrove commented on PR #2156: URL: https://github.com/apache/datafusion-comet/pull/2156#issuecomment-3191484295
> Looks good. I don't see this changing the behavior of native_iceberg_compat though; I assume that is covered by the previous PR? I haven't made any changes specific to `native_iceberg_compat`. The current behavior, before this current PR, is that we attempt to make deep copies just before certain operators (such as `HashJoinExec`) if we detect that the origin of the arrays was a `ScanExec` reading from the JVM (regardless of scan type). There were two bugs in the implementation: - The method `can_reuse_input_batches` was poorly implemented and missed some cases. - We had missed calling `wrap_in_copy_exec` for `SortMergeJoinExec` So far this week, the following related PRs were merged: - https://github.com/apache/datafusion-comet/pull/2142 improved the logic in `can_reuse_input_batches` - https://github.com/apache/datafusion-comet/pull/2155 added calls to `wrap_in_copy_exec` for `SortMergeJoinExec` So the current implementation is _probably_ safe now. This PR makes the logic simpler and more robust by performing the deep copy directly in `ScanExec` and removing the `can_reuse_input_batches` method. The calls to `wrap_in_copy_exec` now just create an `UnpackOrClone` rather than `UnpackOrDeepCopy`. This also allows us to remove the custom Comet version of `FilterExec`, reducing the amount of code we have to maintain. This is arguably a sledgehammer approach to ensuring that we never have memory corruption issues due to underlying buffers being reused (in the mutable Parquet buffer) or prematurely freed due to FFI lifetime bugs, but it doesn't significantly add to the number of copies that we were already making. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org