Re: [DISCUSS][Format] C data interface for Utf8View

Dewey Dunnington Sun, 29 Oct 2023 16:59:38 -0700

In the absence of a general solution to the C data interface omitting
buffer sizes, I think the original proposal is the best way
forward...this is the first type to be added whose buffer sizes cannot
be calculated without looping over every element of the array; the
buffer sizes are needed to efficiently serialize the imported array to
IPC if imported by a consumer that cares about buffer sizes.


Using a schema's flags to indicate something about a specific paired
array (particularly one that, if misinterpreted, would lead to a
crash) is a precedent that is probably not worth introducing for just
one type. Currently a schema is completely independent of any
particular ArrowArray, and I think that is a feature that is worth
preserving. My gripes about not having buffer sizes on the CPU to more
efficiently copy between devices is a concept almost certainly better
suited to the ArrowDeviceArray struct.

On Fri, Oct 27, 2023 at 12:45 PM Benjamin Kietzman <[email protected]> wrote:
>
> > This begs the question of what happens if a consumer receives an unknown
> > flag value.
>
> It seems to me that ignoring unknown flags is the primary case to consider
> at
> this point, since consumers may ignore unknown flags. Since that is the
> case,
> it seems adding any flag which would break such a consumer would be
> tantamount to an ABI breakage. I don't think this can be averted unless all
> consumers are required to error out on unknown flag values.
>
> In the specific case of Utf8View it seems certain that consumers would add
> support for the buffer sizes flag simultaneously with adding support for the
> new type (since Utf8View is difficult to import otherwise), so any consumer
> which would error out on the new flag would already be erroring out on an
> unsupported data type.
>
> > I might be the only person who has implemented
> > a deep copy of an ArrowSchema in C, but it does blindly pass along a
> > schema's flag value
>
> I think passing a schema's flag value including unknown flags is an error.
> The ABI defines moving structures but does not define deep copying. I think
> in order to copy deeply in terms of operations which *are* specified: we
> import then export the schema. Since this includes an export step, it
> should not
> include flags which are not supported by the exporter.
>
> On Thu, Oct 26, 2023 at 6:40 PM Antoine Pitrou <[email protected]> wrote:
>
> >
> > Le 26/10/2023 à 20:02, Benjamin Kietzman a écrit :
> > >> Is this buffer lengths buffer only present if the array type is
> > Utf8View?
> > >
> > > IIUC, the proposal would add the buffer lengths buffer for all types if
> > the
> > > schema's
> > > flags include ARROW_FLAG_BUFFER_LENGTHS. I do find it appealing to avoid
> > > the special case and that `n_buffers` would continue to be consistent
> > with
> > > IPC.
> >
> > This begs the question of what happens if a consumer receives an unknown
> > flag value. We haven't specified that unknown flag values should be
> > ignored, so a consumer could judiciously choose to error out instead of
> > potentially misinterpreting the data.
> >
> > All in all, personally I'd rather we make a special case for Utf8View
> > instead of adding a flag that can lead to worse interoperability.
> >
> > Regards
> >
> > Antoine.
> >

Re: [DISCUSS][Format] C data interface for Utf8View

Reply via email to