>
> I think the `the same length` means the length of the struct array, this is
> similar in the case of RecordBatch where the `num_rows` of a RecordBatch
> can be different to the length of its fields.


 I didn't think this is true and I thought we had prior discussions on the
mailing list suggesting that all top level arrays should have the same
length in the IPC format.

  ASSERT_EQ(sliced->length(), 2);
>   // Note: StructArray::field pushes its offset and length into the created
> array
>   ASSERT_EQ(sliced->field(0)->length(), 2);

>From an IPC perspective this is what I would expect (the buffers themselves
could be longer than needed but they are sliced by the length), IIRC, isn't
ArrayData an internal C++ detail?

Thanks,
Micah



On Sat, May 6, 2023 at 7:34 AM Antoine Pitrou <anto...@python.org> wrote:

>
> I agree with Gang. The fact that a struct field may be backed by a
> physically larger C++ ArrayData is irrelevant, as long as it's logically
> interpreted as having "the same length".
>
> (however, this implementation detail should ideally not leak into IPC or
> C Data exports)
>
> Regards
>
> Antoine.
>
>
> Le 06/05/2023 à 03:31, Gang Wu a écrit :
> > IMHO, this is valid. As you have demonstrated in the example, a sliced
> > struct array will result in a length shorter than its child arrays. This
> > kind
> > of flexibility can make it easy to reuse child arrays within the struct
> > array.
> >
> >> Struct: a nested layout consisting of a collection of named child fields
> >> each having the same length but possibly different types.
> >
> > I think the `the same length` means the length of the struct array, this
> is
> > similar in the case of RecordBatch where the `num_rows` of a RecordBatch
> > can be different to the length of its fields.
> >
> > Best,
> > Gang
> >
> >
> > On Sat, May 6, 2023 at 1:45 AM Weston Pace <weston.p...@gmail.com>
> wrote:
> >
> >> We allow arrays to have a shorter length than their buffers.  Is it also
> >> legal for a struct array to have a shorter length than its child arrays?
> >> For example, in C++, I can create this today by slicing a struct array:
> >>
> >> ```
> >>    std::shared_ptr<StructArray> my_array =
> >> std::dynamic_pointer_cast<StructArray>(array);
> >>    ASSERT_EQ(my_array->length(), 4);
> >>    ASSERT_EQ(my_array->field(0)->length(), 4);
> >>    auto sliced =
> std::dynamic_pointer_cast<StructArray>(my_array->Slice(2));
> >>    ASSERT_EQ(sliced->length(), 2);
> >>    // Note: StructArray::field pushes its offset and length into the
> created
> >> array
> >>    ASSERT_EQ(sliced->field(0)->length(), 2);
> >>    // However, the actually ArrayData objects show the truth
> >>    ASSERT_EQ(sliced->data()->child_data[0]->length, 4);
> >>    // Our validator thinks this is ok
> >>    ASSERT_OK(sliced->ValidateFull());
> >> ```
> >>
> >> The only reference I can find in the spec is this:
> >>
> >>> Struct: a nested layout consisting of a collection of named child
> fields
> >> each
> >>> having the same length but possibly different types.
> >>
> >> This seems to suggest that the C++ implementation is doing something
> >> incorrect.
> >>
> >> I'm asking because I've started to encounter some issues relating to
> >> this[1][2] and I'm not sure if the struct array itself is the issue or
> the
> >> fact that we aren't expecting these kinds of struct arrays is the
> problem.
> >>
> >> [1] https://github.com/apache/arrow/issues/35450
> >> [2] https://github.com/apache/arrow/issues/35452
> >>
> >
>

Reply via email to