andygrove commented on PR #2156:
URL: 
https://github.com/apache/datafusion-comet/pull/2156#issuecomment-3191484295

   > Looks good. I don't see this changing the behavior of 
native_iceberg_compat though; I assume that is covered by the previous PR?
   
   I haven't made any changes specific to `native_iceberg_compat`. 
   
   The current behavior, before this current PR, is that we attempt to make 
deep copies just before certain operators (such as `HashJoinExec`) if we detect 
that the origin of the arrays was a `ScanExec` reading from the JVM (regardless 
of scan type). There were two bugs in the implementation:
   
   - The method `can_reuse_input_batches` was poorly implemented and missed 
some cases.
   - We had missed calling `wrap_in_copy_exec` for `SortMergeJoinExec`
   
   So far this week, the following related PRs were merged:
   
   - https://github.com/apache/datafusion-comet/pull/2142 improved the logic in 
`can_reuse_input_batches`
   - https://github.com/apache/datafusion-comet/pull/2155 added calls to 
`wrap_in_copy_exec` for `SortMergeJoinExec`
   
   So the current implementation is _probably_ safe now.
   
   This PR makes the logic simpler and more robust by performing the deep copy 
directly in `ScanExec` and removing the `can_reuse_input_batches` method. The 
calls to `wrap_in_copy_exec` now just create an `UnpackOrClone` rather than 
`UnpackOrDeepCopy`. This also allows us to remove the custom Comet version of 
`FilterExec`, reducing the amount of code we have to maintain.
   
   This is arguably a sledgehammer approach to ensuring that we never have 
memory corruption issues due to underlying buffers being reused (in the mutable 
Parquet buffer) or prematurely freed due to FFI lifetime bugs, but it doesn't 
significantly add to the number of copies that we were already making.
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to