Re: [DISCUSS][Format] C data interface for Utf8View

Dewey Dunnington Thu, 26 Oct 2023 12:43:50 -0700

> This begs the question of what happens if a consumer receives an unknown flag 
> value


That's a great point...I might be the only person who has implemented
a deep copy of an ArrowSchema in C, but it does blindly pass along a
schema's flag value (which in the scenario I proposed could lead to a
consumer accessing a pointer that didn't exist).

I do think there is utility in considering buffer sizes more
generically in the future...if it is apparently so essential that
every Arrow implementation implements them in this way, it seems like
an oversight to have producers constantly omitting buffer sizes and
consumers constantly recalculating them.

On Thu, Oct 26, 2023 at 4:35 PM Dewey Dunnington <[email protected]> wrote:
>
> I'm afraid I've derailed the discussion into solving a bigger problem
> than strictly necessary. I don't think this is the time to solve the
> general problem of the C data interface having no way to communicate
> buffer sizes, particularly since there's no immediate agreement on its
> utility or implementation, but perhaps it is possible to solve it in a
> way that does not preclude implementing it in some generic way in the
> future.
>
> I think Ben's initial proposal of incrementing n_buffers by one and
> appending an int64_t* pointing to the buffer sizes accomplishes that,
> so consider me a +1. It might perhaps be more general if it included
> all buffer sizes (not just variadic ones), but given that it would
> only be useful for a few other types I don't think that is a game
> changer.
>
> It is probably also worth noting whether we expect the buffer
> containing the sizes to live on the CPU device always or whether we
> want it to live on the same device as the data buffers.
>
> On Thu, Oct 26, 2023 at 4:34 PM Antoine Pitrou <[email protected]> wrote:
> >
> >
> > Le 26/10/2023 à 20:02, Benjamin Kietzman a écrit :
> > >> Is this buffer lengths buffer only present if the array type is Utf8View?
> > >
> > > IIUC, the proposal would add the buffer lengths buffer for all types if 
> > > the
> > > schema's
> > > flags include ARROW_FLAG_BUFFER_LENGTHS. I do find it appealing to avoid
> > > the special case and that `n_buffers` would continue to be consistent with
> > > IPC.
> >
> > This begs the question of what happens if a consumer receives an unknown
> > flag value. We haven't specified that unknown flag values should be
> > ignored, so a consumer could judiciously choose to error out instead of
> > potentially misinterpreting the data.
> >
> > All in all, personally I'd rather we make a special case for Utf8View
> > instead of adding a flag that can lead to worse interoperability.
> >
> > Regards
> >
> > Antoine.

Re: [DISCUSS][Format] C data interface for Utf8View

Reply via email to