Weston Pace created ARROW-15275:
-----------------------------------
Summary: [C++][JNI] DisposableScannerAdaptor does not handle
arrays with offsets
Key: ARROW-15275
URL: https://issues.apache.org/jira/browse/ARROW-15275
Project: Apache Arrow
Issue Type: Bug
Components: C++
Reporter: Weston Pace
The DisposableScannerAdaptor is a JNI bridge from Java to the C++ datasets API.
When it scans record batches it collects all of the buffers from all of the
arrays and returns a list of buffer handles to Java which puts these into an
ArrowRecordBatch on the Java end.
Unfortunately, if the array has offsets then the bridge does not return the
offset buffer but it returns the entire buffer. The Java record batch is then
incorrect. The length is wrong (and so it doesn't fully free the memory) and
the values are incorrect.
I'm not familiar enough with the Java implementation to suggest a good fix.
Figuring out the buffer offsets from array offsets is a bit tricky since the
logic depends on the data type. Also, I'm pretty sure the Java side now has to
take ownership of the entire buffer which could be tricky because multiple
batches could share ownership of the buffer.
As a temporary fix for ARROW-13554 I am going to copy the array if it has an
offset. This means the transfer is not zero-copy so I'm creating this issue to
solve this properly.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)