I agree with the approach originally proposed by Ben. It seems like the most straightforward way to implement within the current protocol.
On Sun, Oct 29, 2023 at 4:59 PM Dewey Dunnington <de...@voltrondata.com.invalid> wrote: > In the absence of a general solution to the C data interface omitting > buffer sizes, I think the original proposal is the best way > forward...this is the first type to be added whose buffer sizes cannot > be calculated without looping over every element of the array; the > buffer sizes are needed to efficiently serialize the imported array to > IPC if imported by a consumer that cares about buffer sizes. > > Using a schema's flags to indicate something about a specific paired > array (particularly one that, if misinterpreted, would lead to a > crash) is a precedent that is probably not worth introducing for just > one type. Currently a schema is completely independent of any > particular ArrowArray, and I think that is a feature that is worth > preserving. My gripes about not having buffer sizes on the CPU to more > efficiently copy between devices is a concept almost certainly better > suited to the ArrowDeviceArray struct. > > On Fri, Oct 27, 2023 at 12:45 PM Benjamin Kietzman <bengil...@gmail.com> > wrote: > > > > > This begs the question of what happens if a consumer receives an > unknown > > > flag value. > > > > It seems to me that ignoring unknown flags is the primary case to > consider > > at > > this point, since consumers may ignore unknown flags. Since that is the > > case, > > it seems adding any flag which would break such a consumer would be > > tantamount to an ABI breakage. I don't think this can be averted unless > all > > consumers are required to error out on unknown flag values. > > > > In the specific case of Utf8View it seems certain that consumers would > add > > support for the buffer sizes flag simultaneously with adding support for > the > > new type (since Utf8View is difficult to import otherwise), so any > consumer > > which would error out on the new flag would already be erroring out on an > > unsupported data type. > > > > > I might be the only person who has implemented > > > a deep copy of an ArrowSchema in C, but it does blindly pass along a > > > schema's flag value > > > > I think passing a schema's flag value including unknown flags is an > error. > > The ABI defines moving structures but does not define deep copying. I > think > > in order to copy deeply in terms of operations which *are* specified: we > > import then export the schema. Since this includes an export step, it > > should not > > include flags which are not supported by the exporter. > > > > On Thu, Oct 26, 2023 at 6:40 PM Antoine Pitrou <anto...@python.org> > wrote: > > > > > > > > Le 26/10/2023 à 20:02, Benjamin Kietzman a écrit : > > > >> Is this buffer lengths buffer only present if the array type is > > > Utf8View? > > > > > > > > IIUC, the proposal would add the buffer lengths buffer for all types > if > > > the > > > > schema's > > > > flags include ARROW_FLAG_BUFFER_LENGTHS. I do find it appealing to > avoid > > > > the special case and that `n_buffers` would continue to be consistent > > > with > > > > IPC. > > > > > > This begs the question of what happens if a consumer receives an > unknown > > > flag value. We haven't specified that unknown flag values should be > > > ignored, so a consumer could judiciously choose to error out instead of > > > potentially misinterpreting the data. > > > > > > All in all, personally I'd rather we make a special case for Utf8View > > > instead of adding a flag that can lead to worse interoperability. > > > > > > Regards > > > > > > Antoine. > > > >