andygrove opened a new issue, #2162: URL: https://github.com/apache/datafusion-comet/issues/2162
### Describe the bug Native code fetches batches from JVM using `CometBatchIterator`. This is called from `ScanExec`. We have seen memory corruption unless ScanExec takes a deep copy of the arrays received from `CometBatchIterator`. On further analysis, it is now clear that the JVM is not retaining ownership of the arrays once they are exported to native. This means that the underlying Arrow buffers get released back to a pool and can be overwritten while native code is still referencing them. I have been able to prove with some debug logging that the JVM closes CometVectors after exporting them and while native code is still processing the data, leading to corruption: Native code (thread 1154171) gets a batch: ``` [1154171] native got batch from jvm: RecordBatch { schema: Schema { fields: [Field { name: "col_0", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "col_1", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, columns: [PrimitiveArray<Int32> [ 6, 7, ], PrimitiveArray<Int32> [ 8, 9, ]], row_count: 2 } ``` JVM closes the vectors: ``` [Executor task launch worker for task 2.0 in stage 15.0 (TID 41)] CometVector.close() [6, 7] [Executor task launch worker for task 2.0 in stage 15.0 (TID 41)] CometVector.close() [8, 9] ``` Native code (still thread 1154171) continues processing, but the buffer has been freed or overwritten. ``` [1154171] writing shuffle batch: RecordBatch { schema: Schema { fields: [Field { name: "col_0", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "col_1", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, columns: [PrimitiveArray<Int32> [ -1342011280, 30839, ], PrimitiveArray<Int32> [ -1342175264, 30839, ]], row_count: 2 } ``` ### Steps to reproduce _No response_ ### Expected behavior _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org