felipecrv commented on code in PR #41092:
URL: https://github.com/apache/arrow/pull/41092#discussion_r1558105300
##########
cpp/src/arrow/array/array_nested.h:
##########
@@ -189,6 +189,11 @@ class ARROW_EXPORT ListArray : public
BaseListArray<ListType> {
Result<std::shared_ptr<Array>> Flatten(
MemoryPool* memory_pool = default_memory_pool()) const;
+ /// \brief Flatten all level recursively until reach a non-list type, and
return a
+ /// non-list type Array.
Review Comment:
I think we should be explicit and list the types that we consider as logical
lists here. It should include lists, list-views, fixed-size-lists, and map.
<img width="1615" alt="Screenshot 2024-04-09 at 15 06 26"
src="https://github.com/apache/arrow/assets/207795/e89cb9fc-a707-4d73-b83d-d20efca71ec7">
##########
cpp/src/arrow/array/array_nested.cc:
##########
@@ -216,12 +216,26 @@ static std::shared_ptr<Array> SliceArrayWithOffsets(const
Array& array, int64_t
return array.Slice(begin, end - begin);
}
+namespace {
+struct FlattenWithRecursion {
+ // Flatten all list-like types array recursively
+ static Result<std::shared_ptr<Array>> Flatten(const Array& array, bool
with_recursion,
+ MemoryPool* memory_pool);
+};
+} // namespace
+
template <typename ListArrayT>
Result<std::shared_ptr<Array>> FlattenListArray(const ListArrayT& list_array,
+ bool with_recursion,
MemoryPool* memory_pool) {
const int64_t list_array_length = list_array.length();
std::shared_ptr<arrow::Array> value_array = list_array.values();
+ // If it's a nested-list related array, flatten recursively.
+ if (is_list_like(value_array->type_id()) && with_recursion) {
+ return FlattenWithRecursion::Flatten(*value_array, with_recursion,
memory_pool);
+ }
+
Review Comment:
Only the recursive version should delegate to this one, not the other way
around. So this function shouldn't gain a boolean parameter.
##########
cpp/src/arrow/array/array_nested.h:
##########
@@ -189,6 +189,11 @@ class ARROW_EXPORT ListArray : public
BaseListArray<ListType> {
Result<std::shared_ptr<Array>> Flatten(
MemoryPool* memory_pool = default_memory_pool()) const;
+ /// \brief Flatten all level recursively until reach a non-list type, and
return a
+ /// non-list type Array.
Review Comment:
And you only have to declare it on `VarLengthListLikeArray<>` and
`FixedSizeListArray` -- they can all simply call the same non-inline function
that dispatches based on `type->id()`. That extra check won't be significant
for a function that scans the entire array and allocates.
##########
cpp/src/arrow/array/array_nested.h:
##########
@@ -189,6 +189,11 @@ class ARROW_EXPORT ListArray : public
BaseListArray<ListType> {
Result<std::shared_ptr<Array>> Flatten(
MemoryPool* memory_pool = default_memory_pool()) const;
+ /// \brief Flatten all level recursively until reach a non-list type, and
return a
+ /// non-list type Array.
+ Result<std::shared_ptr<Array>> FlattenRecursion(
+ MemoryPool* memory_pool = default_memory_pool()) const;
Review Comment:
Name suggestion: `FlattenRecursively`
##########
cpp/src/arrow/array/array_nested.cc:
##########
@@ -351,6 +371,44 @@ Result<std::shared_ptr<Array>> FlattenListViewArray(const
ListViewArrayT& list_v
return Concatenate(slices, memory_pool);
}
+Result<std::shared_ptr<Array>> FlattenWithRecursion::Flatten(const Array&
array,
+ bool
with_recursion,
+ MemoryPool*
memory_pool) {
+ const bool has_nulls = array.null_count() > 0;
+ switch (array.type_id()) {
+ case Type::LIST:
+ return FlattenListArray(checked_cast<const ListArray&>(array),
with_recursion,
+ memory_pool);
Review Comment:
Instead of doing this, you should have a `while` loop that successfully
calls `FlattenListArray`. No `bool with_recursion` should be necessary.
The while loop means we don't stack overflow if someone passes a very deeply
nested logical list array. In the future, we should come to this function and
optimize it specifically for the recursive use-case. If `depth==1`, it's
trivial to delegate to the non-recursive version, otherwise there are many
interesting special cases that can be very efficient to flatten. But for now
the `while` loop with successive flatten calls will do.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]