andygrove opened a new issue, #2162:
URL: https://github.com/apache/datafusion-comet/issues/2162
### Describe the bug
Native code fetches batches from JVM using `CometBatchIterator`. This is
called from `ScanExec`.
We have seen memory corruption unless ScanExec takes a deep copy of the
arrays received from `CometBatchIterator`. On further analysis, it is now clear
that the JVM is not retaining ownership of the arrays once they are exported to
native. This means that the underlying Arrow buffers get released back to a
pool and can be overwritten while native code is still referencing them.
I have been able to prove with some debug logging that the JVM closes
CometVectors after exporting them and while native code is still processing the
data, leading to corruption:
Native code (thread 1154171) gets a batch:
```
[1154171] native got batch from jvm: RecordBatch { schema: Schema { fields:
[Field { name: "col_0", data_type: Int32, nullable: true, dict_id: 0,
dict_is_ordered: false, metadata: {} }, Field { name: "col_1", data_type:
Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }],
metadata: {} }, columns: [PrimitiveArray<Int32>
[
6,
7,
], PrimitiveArray<Int32>
[
8,
9,
]], row_count: 2 }
```
JVM closes the vectors:
```
[Executor task launch worker for task 2.0 in stage 15.0 (TID 41)]
CometVector.close() [6, 7]
[Executor task launch worker for task 2.0 in stage 15.0 (TID 41)]
CometVector.close() [8, 9]
```
Native code (still thread 1154171) continues processing, but the buffer has
been freed or overwritten.
```
[1154171] writing shuffle batch: RecordBatch { schema: Schema { fields:
[Field { name: "col_0", data_type: Int32, nullable: true, dict_id: 0,
dict_is_ordered: false, metadata: {} }, Field { name: "col_1", data_type:
Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }],
metadata: {} }, columns: [PrimitiveArray<Int32>
[
-1342011280,
30839,
], PrimitiveArray<Int32>
[
-1342175264,
30839,
]], row_count: 2 }
```
### Steps to reproduce
_No response_
### Expected behavior
_No response_
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]