felipecrv commented on code in PR #41092:
URL: https://github.com/apache/arrow/pull/41092#discussion_r1558105300


##########
cpp/src/arrow/array/array_nested.h:
##########
@@ -189,6 +189,11 @@ class ARROW_EXPORT ListArray : public 
BaseListArray<ListType> {
   Result<std::shared_ptr<Array>> Flatten(
       MemoryPool* memory_pool = default_memory_pool()) const;
 
+  /// \brief Flatten all level recursively until reach a non-list type, and 
return a
+  /// non-list type Array.

Review Comment:
   I think we should be explicit and list the types that we consider as logical 
lists here. It should include lists, list-views, fixed-size-lists, and map.
   
   <img width="1615" alt="Screenshot 2024-04-09 at 15 06 26" 
src="https://github.com/apache/arrow/assets/207795/e89cb9fc-a707-4d73-b83d-d20efca71ec7";>
   



##########
cpp/src/arrow/array/array_nested.cc:
##########
@@ -216,12 +216,26 @@ static std::shared_ptr<Array> SliceArrayWithOffsets(const 
Array& array, int64_t
   return array.Slice(begin, end - begin);
 }
 
+namespace {
+struct FlattenWithRecursion {
+  // Flatten all list-like types array recursively
+  static Result<std::shared_ptr<Array>> Flatten(const Array& array, bool 
with_recursion,
+                                                MemoryPool* memory_pool);
+};
+}  // namespace
+
 template <typename ListArrayT>
 Result<std::shared_ptr<Array>> FlattenListArray(const ListArrayT& list_array,
+                                                bool with_recursion,
                                                 MemoryPool* memory_pool) {
   const int64_t list_array_length = list_array.length();
   std::shared_ptr<arrow::Array> value_array = list_array.values();
 
+  // If it's a nested-list related array, flatten recursively.
+  if (is_list_like(value_array->type_id()) && with_recursion) {
+    return FlattenWithRecursion::Flatten(*value_array, with_recursion, 
memory_pool);
+  }
+

Review Comment:
   Only the recursive version should delegate to this one, not the other way 
around. So this function shouldn't gain a boolean parameter.



##########
cpp/src/arrow/array/array_nested.h:
##########
@@ -189,6 +189,11 @@ class ARROW_EXPORT ListArray : public 
BaseListArray<ListType> {
   Result<std::shared_ptr<Array>> Flatten(
       MemoryPool* memory_pool = default_memory_pool()) const;
 
+  /// \brief Flatten all level recursively until reach a non-list type, and 
return a
+  /// non-list type Array.

Review Comment:
   And you only have to declare it on `VarLengthListLikeArray<>` and 
`FixedSizeListArray` -- they can all simply call the same non-inline function 
that dispatches based on `type->id()`. That extra check won't be significant 
for a function that scans the entire array and allocates.



##########
cpp/src/arrow/array/array_nested.h:
##########
@@ -189,6 +189,11 @@ class ARROW_EXPORT ListArray : public 
BaseListArray<ListType> {
   Result<std::shared_ptr<Array>> Flatten(
       MemoryPool* memory_pool = default_memory_pool()) const;
 
+  /// \brief Flatten all level recursively until reach a non-list type, and 
return a
+  /// non-list type Array.
+  Result<std::shared_ptr<Array>> FlattenRecursion(
+      MemoryPool* memory_pool = default_memory_pool()) const;

Review Comment:
   Name suggestion: `FlattenRecursively`



##########
cpp/src/arrow/array/array_nested.cc:
##########
@@ -351,6 +371,44 @@ Result<std::shared_ptr<Array>> FlattenListViewArray(const 
ListViewArrayT& list_v
   return Concatenate(slices, memory_pool);
 }
 
+Result<std::shared_ptr<Array>> FlattenWithRecursion::Flatten(const Array& 
array,
+                                                             bool 
with_recursion,
+                                                             MemoryPool* 
memory_pool) {
+  const bool has_nulls = array.null_count() > 0;
+  switch (array.type_id()) {
+    case Type::LIST:
+      return FlattenListArray(checked_cast<const ListArray&>(array), 
with_recursion,
+                              memory_pool);

Review Comment:
   Instead of doing this, you should have a `while` loop that successfully 
calls `FlattenListArray`. No `bool with_recursion` should be necessary.
   
   The while loop means we don't stack overflow if someone passes a very deeply 
nested logical list array. In the future, we should come to this function and 
optimize it specifically for the recursive use-case. If `depth==1`, it's 
trivial to delegate to the non-recursive version, otherwise there are many 
interesting special cases that can be very efficient to flatten. But for now 
the `while` loop with successive flatten calls will do.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to