I'm afraid I've derailed the discussion into solving a bigger problem
than strictly necessary. I don't think this is the time to solve the
general problem of the C data interface having no way to communicate
buffer sizes, particularly since there's no immediate agreement on its
utility or implementation, but perhaps it is possible to solve it in a
way that does not preclude implementing it in some generic way in the
future.

I think Ben's initial proposal of incrementing n_buffers by one and
appending an int64_t* pointing to the buffer sizes accomplishes that,
so consider me a +1. It might perhaps be more general if it included
all buffer sizes (not just variadic ones), but given that it would
only be useful for a few other types I don't think that is a game
changer.

It is probably also worth noting whether we expect the buffer
containing the sizes to live on the CPU device always or whether we
want it to live on the same device as the data buffers.

On Thu, Oct 26, 2023 at 4:34 PM Antoine Pitrou <anto...@python.org> wrote:
>
>
> Le 26/10/2023 à 20:02, Benjamin Kietzman a écrit :
> >> Is this buffer lengths buffer only present if the array type is Utf8View?
> >
> > IIUC, the proposal would add the buffer lengths buffer for all types if the
> > schema's
> > flags include ARROW_FLAG_BUFFER_LENGTHS. I do find it appealing to avoid
> > the special case and that `n_buffers` would continue to be consistent with
> > IPC.
>
> This begs the question of what happens if a consumer receives an unknown
> flag value. We haven't specified that unknown flag values should be
> ignored, so a consumer could judiciously choose to error out instead of
> potentially misinterpreting the data.
>
> All in all, personally I'd rather we make a special case for Utf8View
> instead of adding a flag that can lead to worse interoperability.
>
> Regards
>
> Antoine.

Reply via email to