WillAyd commented on code in PR #14505:
URL: https://github.com/apache/arrow/pull/14505#discussion_r1059187675
##########
cpp/src/arrow/compute/kernels/vector_sort.cc:
##########
@@ -2078,6 +2078,293 @@ class ArrayRanker : public TypeVisitor {
Datum* output_;
};
+class ChunkedArrayRanker : public TypeVisitor {
+ public:
+ // TODO: here we accept order / null_placement / tiebreaker as separate
arguments
+ // whereas the ArrayRanker accepts them as the RankOptions struct; this is
consistent
+ // with ArraySorter / ChunkedArraySorter, so likely should refactor
ArrayRanker
+ ChunkedArrayRanker(ExecContext* ctx, uint64_t* indices_begin, uint64_t*
indices_end,
+ const ChunkedArray& chunked_array, const SortOrder order,
+ const NullPlacement null_placement, const
RankOptions::Tiebreaker tiebreaker, Datum* output)
+ : TypeVisitor(),
+ ctx_(ctx),
+ indices_begin_(indices_begin),
+ indices_end_(indices_end),
+ chunked_array_(chunked_array),
+ physical_type_(GetPhysicalType(chunked_array.type())),
+ physical_chunks_(GetPhysicalChunks(chunked_array_, physical_type_)),
+ order_(order),
+ null_placement_(null_placement),
+ tiebreaker_(tiebreaker),
+ output_(output) {}
+
+ Status Run() { return physical_type_->Accept(this); }
+
+#define VISIT(TYPE) \
+ Status Visit(const TYPE& type) { return RankInternal<TYPE>(); }
+
+ VISIT_SORTABLE_PHYSICAL_TYPES(VISIT)
+
+#undef VISIT
+
+ template <typename InType>
+ Status RankInternal() {
+ using GetView = GetViewType<InType>;
+ using T = typename GetViewType<InType>::T;
+ using ArrayType = typename TypeTraits<InType>::ArrayType;
+
+ const auto num_chunks = chunked_array_.num_chunks();
+ if (num_chunks == 0) {
+ return Status::OK();
+ }
+ const auto arrays = GetArrayPointers(physical_chunks_);
+
+ ArraySortOptions array_options(order_, null_placement_);
+
+ ARROW_ASSIGN_OR_RAISE(auto array_sorter, GetArraySorter(*physical_type_));
+
+ // See related ChunkedArraySort method for comments
Review Comment:
I've been jumping through the `ArraySorter` a bit and am not sure of a
really clean way to make that work. I looked at both templating the existing
`ArraySortFunc` as well as creating a new `GetChunkedArraySorter` factory, but
I think both are limited by the fact that `ArraySortFunc` requires an `Array&
argument`, so would have to overhaul things significantly to share code.
The other caveat is that the `ChunkedArraySorter` and `TableSorter`
implementations right now both internally manage their own implementations to
merge the `NullPartitionResult` vector.
I'm thinking of creating a visitor that returns the sorted
NullPartitionResult, though even then am a bit unsure as the ChunkedArray /
Table sort classes currently write the sorted output array as they step through
their internal merge implementations
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]