Weston Pace created ARROW-15275:
-----------------------------------

             Summary: [C++][JNI] DisposableScannerAdaptor does not handle 
arrays with offsets
                 Key: ARROW-15275
                 URL: https://issues.apache.org/jira/browse/ARROW-15275
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++
            Reporter: Weston Pace


The DisposableScannerAdaptor is a JNI bridge from Java to the C++ datasets API. 
 When it scans record batches it collects all of the buffers from all of the 
arrays and returns a list of buffer handles to Java which puts these into an 
ArrowRecordBatch on the Java end.

Unfortunately, if the array has offsets then the bridge does not return the 
offset buffer but it returns the entire buffer.  The Java record batch is then 
incorrect.  The length is wrong (and so it doesn't fully free the memory) and 
the values are incorrect.

I'm not familiar enough with the Java implementation to suggest a good fix.  
Figuring out the buffer offsets from array offsets is a bit tricky since the 
logic depends on the data type.  Also, I'm pretty sure the Java side now has to 
take ownership of the entire buffer which could be tricky because multiple 
batches could share ownership of the buffer.

As a temporary fix for ARROW-13554 I am going to copy the array if it has an 
offset.  This means the transfer is not zero-copy so I'm creating this issue to 
solve this properly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to