I agree with Gang. The fact that a struct field may be backed by a
physically larger C++ ArrayData is irrelevant, as long as it's logically
interpreted as having "the same length".
(however, this implementation detail should ideally not leak into IPC or
C Data exports)
Regards
Antoine.
Le 06/05/2023 à 03:31, Gang Wu a écrit :
IMHO, this is valid. As you have demonstrated in the example, a sliced
struct array will result in a length shorter than its child arrays. This
kind
of flexibility can make it easy to reuse child arrays within the struct
array.
Struct: a nested layout consisting of a collection of named child fields
each having the same length but possibly different types.
I think the `the same length` means the length of the struct array, this is
similar in the case of RecordBatch where the `num_rows` of a RecordBatch
can be different to the length of its fields.
Best,
Gang
On Sat, May 6, 2023 at 1:45 AM Weston Pace <weston.p...@gmail.com> wrote:
We allow arrays to have a shorter length than their buffers. Is it also
legal for a struct array to have a shorter length than its child arrays?
For example, in C++, I can create this today by slicing a struct array:
```
std::shared_ptr<StructArray> my_array =
std::dynamic_pointer_cast<StructArray>(array);
ASSERT_EQ(my_array->length(), 4);
ASSERT_EQ(my_array->field(0)->length(), 4);
auto sliced = std::dynamic_pointer_cast<StructArray>(my_array->Slice(2));
ASSERT_EQ(sliced->length(), 2);
// Note: StructArray::field pushes its offset and length into the created
array
ASSERT_EQ(sliced->field(0)->length(), 2);
// However, the actually ArrayData objects show the truth
ASSERT_EQ(sliced->data()->child_data[0]->length, 4);
// Our validator thinks this is ok
ASSERT_OK(sliced->ValidateFull());
```
The only reference I can find in the spec is this:
Struct: a nested layout consisting of a collection of named child fields
each
having the same length but possibly different types.
This seems to suggest that the C++ implementation is doing something
incorrect.
I'm asking because I've started to encounter some issues relating to
this[1][2] and I'm not sure if the struct array itself is the issue or the
fact that we aren't expecting these kinds of struct arrays is the problem.
[1] https://github.com/apache/arrow/issues/35450
[2] https://github.com/apache/arrow/issues/35452