[
https://issues.apache.org/jira/browse/ARROW-14453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Antoine Pitrou updated ARROW-14453:
-----------------------------------
Component/s: Format
> [C data interface] Clarify that buffers must only be accessed past the offset
> -----------------------------------------------------------------------------
>
> Key: ARROW-14453
> URL: https://issues.apache.org/jira/browse/ARROW-14453
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Format
> Reporter: Jorge Leitão
> Priority: Major
>
> I would like to propose that we clarify in the c data interface that the
> buffers can only be accessed past the offset (with the pointer arithmetic
> corresponding to the buffer).
> E.g. given a primitive array with an offset of 10 and buffer starting at
> pointer position `p`, consumers _must not_ access any of the positions [p,
> p+1, ..., p-1+9].
> Without the condition above, it is not possible for a user to use a sliced
> buffer on a primitive array with a validity and an offset.
> E.g. consider an array with an offset of 10, a buffer of 12 u8s that has been
> sliced by 4. For the array to be exported correctly, we will need to offset
> the buffer by -6 (4 - 10), so that the consumer can jump the first 10 bytes
> and only "see" the bytes at positions 4, 5, 6, etc of the original pointer.
> Note that this behavior (of slicing a buffer and building an array with it)
> can only be done with buffers. In the booleanArray it is currently not
> possible to "slice" the buffer without it being a multiple of 8 slots, since
> the C data interface has no mechanism to share independent offsets.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)