IMHO, this is valid. As you have demonstrated in the example, a sliced
struct array will result in a length shorter than its child arrays. This
kind
of flexibility can make it easy to reuse child arrays within the struct
array.

> Struct: a nested layout consisting of a collection of named child fields
> each having the same length but possibly different types.

I think the `the same length` means the length of the struct array, this is
similar in the case of RecordBatch where the `num_rows` of a RecordBatch
can be different to the length of its fields.

Best,
Gang


On Sat, May 6, 2023 at 1:45 AM Weston Pace <weston.p...@gmail.com> wrote:

> We allow arrays to have a shorter length than their buffers.  Is it also
> legal for a struct array to have a shorter length than its child arrays?
> For example, in C++, I can create this today by slicing a struct array:
>
> ```
>   std::shared_ptr<StructArray> my_array =
> std::dynamic_pointer_cast<StructArray>(array);
>   ASSERT_EQ(my_array->length(), 4);
>   ASSERT_EQ(my_array->field(0)->length(), 4);
>   auto sliced = std::dynamic_pointer_cast<StructArray>(my_array->Slice(2));
>   ASSERT_EQ(sliced->length(), 2);
>   // Note: StructArray::field pushes its offset and length into the created
> array
>   ASSERT_EQ(sliced->field(0)->length(), 2);
>   // However, the actually ArrayData objects show the truth
>   ASSERT_EQ(sliced->data()->child_data[0]->length, 4);
>   // Our validator thinks this is ok
>   ASSERT_OK(sliced->ValidateFull());
> ```
>
> The only reference I can find in the spec is this:
>
> > Struct: a nested layout consisting of a collection of named child fields
> each
> > having the same length but possibly different types.
>
> This seems to suggest that the C++ implementation is doing something
> incorrect.
>
> I'm asking because I've started to encounter some issues relating to
> this[1][2] and I'm not sure if the struct array itself is the issue or the
> fact that we aren't expecting these kinds of struct arrays is the problem.
>
> [1] https://github.com/apache/arrow/issues/35450
> [2] https://github.com/apache/arrow/issues/35452
>

Reply via email to