felipecrv commented on code in PR #35345:
URL: https://github.com/apache/arrow/pull/35345#discussion_r1362125122
##########
cpp/src/arrow/array/array_nested.cc:
##########
@@ -189,11 +260,113 @@ Result<std::shared_ptr<Array>> FlattenListArray(const
ListArrayT& list_array,
return Concatenate(non_null_fragments, memory_pool);
}
+template <typename ListViewArrayT>
+Result<std::shared_ptr<Array>> FlattenListViewArray(const ListViewArrayT&
list_view_array,
+ MemoryPool* memory_pool) {
+ using offset_type = typename ListViewArrayT::offset_type;
+ const int64_t list_view_array_length = list_view_array.length();
+ std::shared_ptr<arrow::Array> value_array = list_view_array.values();
+
+ if (list_view_array_length == 0) {
+ return SliceArrayWithOffsets(*value_array, 0, 0);
+ }
+
+ // If the list array is *all* nulls, then just return an empty array.
+ if (list_view_array.null_count() == list_view_array.length()) {
+ return MakeEmptyArray(value_array->type(), memory_pool);
+ }
+
+ const auto* validity = list_view_array.data()->template
GetValues<uint8_t>(0, 0);
+ const auto* offsets = list_view_array.data()->template
GetValues<offset_type>(1);
+ const auto* sizes = list_view_array.data()->template
GetValues<offset_type>(2);
+
+ // If a ListViewArray:
+ //
+ // 1) does not contain nulls
+ // 2) has sorted offsets
+ // 3) has disjoint views which completely cover the values array
+ //
+ // then simply slice its value array with the first offset and end of the
last list
+ // view.
+ if (list_view_array.null_count() == 0) {
+ bool sorted_and_disjoint = true;
+ for (int64_t i = 1; sorted_and_disjoint && i < list_view_array_length;
++i) {
+ sorted_and_disjoint &=
+ sizes[i - 1] == 0 || offsets[i] - offsets[i - 1] == sizes[i - 1];
+ }
Review Comment:
An advantage of skipping the check is that when `size[i]==0` we don't have
to require the offset to be anything specific as long as the next list-view
starts right where the previous non-empty list-view ended.
Your comment made me realize one problem though: if the list-view arrays
starts or ends with an empty list-view I could potentially use very random
offsets to construct the slice at the end. The new implementation reduces the
number of special cases, has an informal proof of correctness by induction, and
more tests.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]