Jorge Leitão created ARROW-14453:
------------------------------------

             Summary: [C data interface] Clarify that buffers must only be 
accessed past the offset
                 Key: ARROW-14453
                 URL: https://issues.apache.org/jira/browse/ARROW-14453
             Project: Apache Arrow
          Issue Type: Improvement
            Reporter: Jorge Leitão


I would like to propose that we clarify in the c data interface that the 
buffers can only be accessed past the offset (with the pointer arithmetic 
corresponding to the buffer).

E.g. given a primitive array with an offset of 10 and buffer starting at 
pointer position `p`, consumers _must not_ access any of the positions [p, p+1, 
..., p-1+9].

Without the condition above, it is not possible for a user to use a sliced 
buffer on a primitive array with a validity and an offset.

E.g. consider an array with an offset of 10, a buffer of 12 u8s that has been 
sliced by 4. For the array to be exported correctly, we will need to offset the 
buffer by -6 (4 - 10), so that the consumer can jump the first 10 bytes and 
only "see" the bytes at positions 4, 5, 6, etc of the original pointer.

Note that this behavior (of slicing a buffer and building an array with it) can 
only be done with buffers. In the booleanArray it is currently not possible to 
"slice" the buffer without it being a multiple of 8 slots, since the C data 
interface has no mechanism to share independent offsets.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to