Given the constraints of not changing the existing struct definitions, adding a new buffer seems like the only way forward from what I understand. It is unfortunate that each array now needs need a new allocation (the buffer lengths) when passing via FFI, but I don't have any other suggestions unfortunately
Andrew On Tue, Nov 7, 2023 at 5:46 PM Weston Pace <weston.p...@gmail.com> wrote: > +1 for the original proposal as well. > > --- > > The (minor) problem I see with flags is that there isn't much point to this > feature if you are gating on a flag. I'm assuming the goal is what Dewey > originally mentioned which is making buffer calculations easier. However, > if you're gating the feature with a flag then you are either: > > * Rejecting input from producers that don't support this feature > (undesirable, better to align on one use model if we can) > * Doing all the work anyways to handle producers that don't support the > feature > > Maybe it makes sense for a long term migration (e.g. we all agree this is > something we want to move towards but we need to handle old producers in > the meantime) but we can always discuss that separately and I don't think > the benefit here is worth the confusion. > > On Tue, Nov 7, 2023 at 7:46 AM Will Jones <will.jones...@gmail.com> wrote: > > > I agree with the approach originally proposed by Ben. It seems like the > > most straightforward way to implement within the current protocol. > > > > On Sun, Oct 29, 2023 at 4:59 PM Dewey Dunnington > > <de...@voltrondata.com.invalid> wrote: > > > > > In the absence of a general solution to the C data interface omitting > > > buffer sizes, I think the original proposal is the best way > > > forward...this is the first type to be added whose buffer sizes cannot > > > be calculated without looping over every element of the array; the > > > buffer sizes are needed to efficiently serialize the imported array to > > > IPC if imported by a consumer that cares about buffer sizes. > > > > > > Using a schema's flags to indicate something about a specific paired > > > array (particularly one that, if misinterpreted, would lead to a > > > crash) is a precedent that is probably not worth introducing for just > > > one type. Currently a schema is completely independent of any > > > particular ArrowArray, and I think that is a feature that is worth > > > preserving. My gripes about not having buffer sizes on the CPU to more > > > efficiently copy between devices is a concept almost certainly better > > > suited to the ArrowDeviceArray struct. > > > > > > On Fri, Oct 27, 2023 at 12:45 PM Benjamin Kietzman < > bengil...@gmail.com> > > > wrote: > > > > > > > > > This begs the question of what happens if a consumer receives an > > > unknown > > > > > flag value. > > > > > > > > It seems to me that ignoring unknown flags is the primary case to > > > consider > > > > at > > > > this point, since consumers may ignore unknown flags. Since that is > the > > > > case, > > > > it seems adding any flag which would break such a consumer would be > > > > tantamount to an ABI breakage. I don't think this can be averted > unless > > > all > > > > consumers are required to error out on unknown flag values. > > > > > > > > In the specific case of Utf8View it seems certain that consumers > would > > > add > > > > support for the buffer sizes flag simultaneously with adding support > > for > > > the > > > > new type (since Utf8View is difficult to import otherwise), so any > > > consumer > > > > which would error out on the new flag would already be erroring out > on > > an > > > > unsupported data type. > > > > > > > > > I might be the only person who has implemented > > > > > a deep copy of an ArrowSchema in C, but it does blindly pass along > a > > > > > schema's flag value > > > > > > > > I think passing a schema's flag value including unknown flags is an > > > error. > > > > The ABI defines moving structures but does not define deep copying. I > > > think > > > > in order to copy deeply in terms of operations which *are* specified: > > we > > > > import then export the schema. Since this includes an export step, it > > > > should not > > > > include flags which are not supported by the exporter. > > > > > > > > On Thu, Oct 26, 2023 at 6:40 PM Antoine Pitrou <anto...@python.org> > > > wrote: > > > > > > > > > > > > > > Le 26/10/2023 à 20:02, Benjamin Kietzman a écrit : > > > > > >> Is this buffer lengths buffer only present if the array type is > > > > > Utf8View? > > > > > > > > > > > > IIUC, the proposal would add the buffer lengths buffer for all > > types > > > if > > > > > the > > > > > > schema's > > > > > > flags include ARROW_FLAG_BUFFER_LENGTHS. I do find it appealing > to > > > avoid > > > > > > the special case and that `n_buffers` would continue to be > > consistent > > > > > with > > > > > > IPC. > > > > > > > > > > This begs the question of what happens if a consumer receives an > > > unknown > > > > > flag value. We haven't specified that unknown flag values should be > > > > > ignored, so a consumer could judiciously choose to error out > instead > > of > > > > > potentially misinterpreting the data. > > > > > > > > > > All in all, personally I'd rather we make a special case for > Utf8View > > > > > instead of adding a flag that can lead to worse interoperability. > > > > > > > > > > Regards > > > > > > > > > > Antoine. > > > > > > > > > > >