I agree with Gang. The fact that a struct field may be backed by a physically larger C++ ArrayData is irrelevant, as long as it's logically interpreted as having "the same length".

(however, this implementation detail should ideally not leak into IPC or C Data exports)

Regards

Antoine.


Le 06/05/2023 à 03:31, Gang Wu a écrit :
IMHO, this is valid. As you have demonstrated in the example, a sliced
struct array will result in a length shorter than its child arrays. This
kind
of flexibility can make it easy to reuse child arrays within the struct
array.

Struct: a nested layout consisting of a collection of named child fields
each having the same length but possibly different types.

I think the `the same length` means the length of the struct array, this is
similar in the case of RecordBatch where the `num_rows` of a RecordBatch
can be different to the length of its fields.

Best,
Gang


On Sat, May 6, 2023 at 1:45 AM Weston Pace <weston.p...@gmail.com> wrote:

We allow arrays to have a shorter length than their buffers.  Is it also
legal for a struct array to have a shorter length than its child arrays?
For example, in C++, I can create this today by slicing a struct array:

```
   std::shared_ptr<StructArray> my_array =
std::dynamic_pointer_cast<StructArray>(array);
   ASSERT_EQ(my_array->length(), 4);
   ASSERT_EQ(my_array->field(0)->length(), 4);
   auto sliced = std::dynamic_pointer_cast<StructArray>(my_array->Slice(2));
   ASSERT_EQ(sliced->length(), 2);
   // Note: StructArray::field pushes its offset and length into the created
array
   ASSERT_EQ(sliced->field(0)->length(), 2);
   // However, the actually ArrayData objects show the truth
   ASSERT_EQ(sliced->data()->child_data[0]->length, 4);
   // Our validator thinks this is ok
   ASSERT_OK(sliced->ValidateFull());
```

The only reference I can find in the spec is this:

Struct: a nested layout consisting of a collection of named child fields
each
having the same length but possibly different types.

This seems to suggest that the C++ implementation is doing something
incorrect.

I'm asking because I've started to encounter some issues relating to
this[1][2] and I'm not sure if the struct array itself is the issue or the
fact that we aren't expecting these kinds of struct arrays is the problem.

[1] https://github.com/apache/arrow/issues/35450
[2] https://github.com/apache/arrow/issues/35452


Reply via email to