Re: [DISCUSS][Format] C data interface for Utf8View

Will Jones Tue, 07 Nov 2023 07:46:10 -0800

I agree with the approach originally proposed by Ben. It seems like the
most straightforward way to implement within the current protocol.


On Sun, Oct 29, 2023 at 4:59 PM Dewey Dunnington
<de...@voltrondata.com.invalid> wrote:

> In the absence of a general solution to the C data interface omitting
> buffer sizes, I think the original proposal is the best way
> forward...this is the first type to be added whose buffer sizes cannot
> be calculated without looping over every element of the array; the
> buffer sizes are needed to efficiently serialize the imported array to
> IPC if imported by a consumer that cares about buffer sizes.
>
> Using a schema's flags to indicate something about a specific paired
> array (particularly one that, if misinterpreted, would lead to a
> crash) is a precedent that is probably not worth introducing for just
> one type. Currently a schema is completely independent of any
> particular ArrowArray, and I think that is a feature that is worth
> preserving. My gripes about not having buffer sizes on the CPU to more
> efficiently copy between devices is a concept almost certainly better
> suited to the ArrowDeviceArray struct.
>
> On Fri, Oct 27, 2023 at 12:45 PM Benjamin Kietzman <bengil...@gmail.com>
> wrote:
> >
> > > This begs the question of what happens if a consumer receives an
> unknown
> > > flag value.
> >
> > It seems to me that ignoring unknown flags is the primary case to
> consider
> > at
> > this point, since consumers may ignore unknown flags. Since that is the
> > case,
> > it seems adding any flag which would break such a consumer would be
> > tantamount to an ABI breakage. I don't think this can be averted unless
> all
> > consumers are required to error out on unknown flag values.
> >
> > In the specific case of Utf8View it seems certain that consumers would
> add
> > support for the buffer sizes flag simultaneously with adding support for
> the
> > new type (since Utf8View is difficult to import otherwise), so any
> consumer
> > which would error out on the new flag would already be erroring out on an
> > unsupported data type.
> >
> > > I might be the only person who has implemented
> > > a deep copy of an ArrowSchema in C, but it does blindly pass along a
> > > schema's flag value
> >
> > I think passing a schema's flag value including unknown flags is an
> error.
> > The ABI defines moving structures but does not define deep copying. I
> think
> > in order to copy deeply in terms of operations which *are* specified: we
> > import then export the schema. Since this includes an export step, it
> > should not
> > include flags which are not supported by the exporter.
> >
> > On Thu, Oct 26, 2023 at 6:40 PM Antoine Pitrou <anto...@python.org>
> wrote:
> >
> > >
> > > Le 26/10/2023 à 20:02, Benjamin Kietzman a écrit :
> > > >> Is this buffer lengths buffer only present if the array type is
> > > Utf8View?
> > > >
> > > > IIUC, the proposal would add the buffer lengths buffer for all types
> if
> > > the
> > > > schema's
> > > > flags include ARROW_FLAG_BUFFER_LENGTHS. I do find it appealing to
> avoid
> > > > the special case and that `n_buffers` would continue to be consistent
> > > with
> > > > IPC.
> > >
> > > This begs the question of what happens if a consumer receives an
> unknown
> > > flag value. We haven't specified that unknown flag values should be
> > > ignored, so a consumer could judiciously choose to error out instead of
> > > potentially misinterpreting the data.
> > >
> > > All in all, personally I'd rather we make a special case for Utf8View
> > > instead of adding a flag that can lead to worse interoperability.
> > >
> > > Regards
> > >
> > > Antoine.
> > >
>

Re: [DISCUSS][Format] C data interface for Utf8View

Reply via email to