felipecrv commented on code in PR #41295:
URL: https://github.com/apache/arrow/pull/41295#discussion_r1572588581
##########
python/pyarrow/array.pxi:
##########
@@ -2141,22 +2141,99 @@ cdef class Decimal256Array(FixedSizeBinaryArray):
cdef class BaseListArray(Array):
- def flatten(self):
+ def flatten(self, recursively=False):
"""
- Unnest this ListArray/LargeListArray by one level.
-
- The returned Array is logically a concatenation of all the sub-lists
- in this Array.
+ Unnest this [Large]ListArray/[Large]ListViewArray/FixedSizeListArray
+ according to 'recursively'.
Note that this method is different from ``self.values`` in that
it takes care of the slicing offset as well as null elements backed
by non-empty sub-lists.
+ Parameters
+ ----------
+ recursively : bool, defalut false, optional
+ When true, flatten this logical list-array recursively until an
+ array of non-list values is reached.
+ When false, flatten this logical list-array by one level
+
Returns
-------
result : Array
+
+ Examples
+ --------
+
+ Basic logical list-array's flatten
+ >>> import pyarrow as pa
+ >>> values = [1, 2, 3, 4]
+ >>> offsets = [2, 1, 0]
+ >>> sizes = [2, 2, 2]
+ >>> array = pa.ListViewArray.from_arrays(offsets, sizes, values)
+ >>> array
+ <pyarrow.lib.ListViewArray object at ...>
+ [
+ [
+ 3,
+ 4
+ ],
+ [
+ 2,
+ 3
+ ],
+ [
+ 1,
+ 2
+ ]
+ ]
+ >>> array.flatten()
+ <pyarrow.lib.Int64Array object at ...>
+ [
+ 3,
+ 4,
+ 2,
+ 3,
+ 1,
+ 2
+ ]
+
+ If an logical list-array is nested with multi-level, the array will
+ be flattened recursively until an array of non-list values is reached
+ if we enable recursively=True.
Review Comment:
keep "recursively" on the line above but the parameter name should be
"recursive"
##########
cpp/src/arrow/compute/api_vector.h:
##########
@@ -245,6 +245,18 @@ class ARROW_EXPORT PairwiseOptions : public
FunctionOptions {
int64_t periods = 1;
};
+/// \brief Options for list_flatten function
+class ARROW_EXPORT ListFlattenOptions : public FunctionOptions {
+ public:
+ explicit ListFlattenOptions(bool recursively = false);
Review Comment:
`recursive` for consistency
##########
cpp/src/arrow/compute/kernels/vector_nested.cc:
##########
@@ -107,10 +113,13 @@ struct ListParentIndicesArray {
const FunctionDoc list_flatten_doc(
"Flatten list values",
- ("`lists` must have a list-like type.\n"
- "Return an array with the top list level flattened.\n"
+ ("`lists` must have a logical list type like `[Large]ListType`, \n"
+ "`[Large]ListViewType` and `FixedSizeListType`. \n"
+ "Whether to flatten the top list level or the bottom list level \n"
+ "will be decided based on the `recursive` option specified in \n"
+ ":struct:`ListFlattenOptions`. \n"
Review Comment:
@jorisvandenbossche do you know all the places where this text ends up at?
Is it correct? Is the `:struct:` markup going to be interpreted in an useful
way?
##########
cpp/src/arrow/compute/kernels/codegen_internal.cc:
##########
@@ -56,9 +56,16 @@ Result<TypeHolder> LastType(KernelContext*, const
std::vector<TypeHolder>& types
return types.back();
}
-Result<TypeHolder> ListValuesType(KernelContext*, const
std::vector<TypeHolder>& args) {
- const auto& list_type = checked_cast<const BaseListType&>(*args[0].type);
- return list_type.value_type().get();
+Result<TypeHolder> ListValuesType(KernelContext* ctx,
+ const std::vector<TypeHolder>& args) {
+ auto list_type = checked_cast<const BaseListType*>(args[0].type);
+ auto value_type = list_type->value_type().get();
+ for (auto value_kind = value_type->id();
+ is_list(value_kind) || is_list_view(value_kind); value_kind =
value_type->id()) {
+ list_type = checked_cast<const
BaseListType*>(list_type->value_type().get());
+ value_type = list_type->value_type().get();
+ }
Review Comment:
Previous implementation of this function would return a `list<int32>` type
if you passed `list<list<int32>>`. Not it returns `int32`. Are you sure that is
safe for all the usages? If yes, the name should reflect that it's recursive
now. Since "list values type" has a well-defined meaning in the Arrow types.
##########
python/pyarrow/_compute.pyx:
##########
@@ -2035,6 +2035,26 @@ class PairwiseOptions(_PairwiseOptions):
self._set_options(period)
+cdef class _ListFlattenOptions(FunctionOptions):
+ def _set_options(self, recursively):
+ self.wrapped.reset(new CListFlattenOptions(recursively))
+
+
+class ListFlattenOptions(_ListFlattenOptions):
+ """
+ Options for `list_flatten` function
+
+ Parameters
+ ----------
+ recursively : bool, defalut false
Review Comment:
-> recursive (very common param name in Python functions)
##########
python/pyarrow/array.pxi:
##########
@@ -2141,22 +2141,99 @@ cdef class Decimal256Array(FixedSizeBinaryArray):
cdef class BaseListArray(Array):
- def flatten(self):
+ def flatten(self, recursively=False):
"""
- Unnest this ListArray/LargeListArray by one level.
-
- The returned Array is logically a concatenation of all the sub-lists
- in this Array.
+ Unnest this [Large]ListArray/[Large]ListViewArray/FixedSizeListArray
+ according to 'recursively'.
Note that this method is different from ``self.values`` in that
it takes care of the slicing offset as well as null elements backed
by non-empty sub-lists.
+ Parameters
+ ----------
+ recursively : bool, defalut false, optional
Review Comment:
recursive
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]