> > I think the `the same length` means the length of the struct array, this is > similar in the case of RecordBatch where the `num_rows` of a RecordBatch > can be different to the length of its fields.
I didn't think this is true and I thought we had prior discussions on the mailing list suggesting that all top level arrays should have the same length in the IPC format. ASSERT_EQ(sliced->length(), 2); > // Note: StructArray::field pushes its offset and length into the created > array > ASSERT_EQ(sliced->field(0)->length(), 2); >From an IPC perspective this is what I would expect (the buffers themselves could be longer than needed but they are sliced by the length), IIRC, isn't ArrayData an internal C++ detail? Thanks, Micah On Sat, May 6, 2023 at 7:34 AM Antoine Pitrou <anto...@python.org> wrote: > > I agree with Gang. The fact that a struct field may be backed by a > physically larger C++ ArrayData is irrelevant, as long as it's logically > interpreted as having "the same length". > > (however, this implementation detail should ideally not leak into IPC or > C Data exports) > > Regards > > Antoine. > > > Le 06/05/2023 à 03:31, Gang Wu a écrit : > > IMHO, this is valid. As you have demonstrated in the example, a sliced > > struct array will result in a length shorter than its child arrays. This > > kind > > of flexibility can make it easy to reuse child arrays within the struct > > array. > > > >> Struct: a nested layout consisting of a collection of named child fields > >> each having the same length but possibly different types. > > > > I think the `the same length` means the length of the struct array, this > is > > similar in the case of RecordBatch where the `num_rows` of a RecordBatch > > can be different to the length of its fields. > > > > Best, > > Gang > > > > > > On Sat, May 6, 2023 at 1:45 AM Weston Pace <weston.p...@gmail.com> > wrote: > > > >> We allow arrays to have a shorter length than their buffers. Is it also > >> legal for a struct array to have a shorter length than its child arrays? > >> For example, in C++, I can create this today by slicing a struct array: > >> > >> ``` > >> std::shared_ptr<StructArray> my_array = > >> std::dynamic_pointer_cast<StructArray>(array); > >> ASSERT_EQ(my_array->length(), 4); > >> ASSERT_EQ(my_array->field(0)->length(), 4); > >> auto sliced = > std::dynamic_pointer_cast<StructArray>(my_array->Slice(2)); > >> ASSERT_EQ(sliced->length(), 2); > >> // Note: StructArray::field pushes its offset and length into the > created > >> array > >> ASSERT_EQ(sliced->field(0)->length(), 2); > >> // However, the actually ArrayData objects show the truth > >> ASSERT_EQ(sliced->data()->child_data[0]->length, 4); > >> // Our validator thinks this is ok > >> ASSERT_OK(sliced->ValidateFull()); > >> ``` > >> > >> The only reference I can find in the spec is this: > >> > >>> Struct: a nested layout consisting of a collection of named child > fields > >> each > >>> having the same length but possibly different types. > >> > >> This seems to suggest that the C++ implementation is doing something > >> incorrect. > >> > >> I'm asking because I've started to encounter some issues relating to > >> this[1][2] and I'm not sure if the struct array itself is the issue or > the > >> fact that we aren't expecting these kinds of struct arrays is the > problem. > >> > >> [1] https://github.com/apache/arrow/issues/35450 > >> [2] https://github.com/apache/arrow/issues/35452 > >> > > >