amol- commented on code in PR #40971:
URL: https://github.com/apache/arrow/pull/40971#discussion_r1561756130
##########
cpp/src/arrow/compute/kernels/vector_selection_filter_internal.cc:
##########
@@ -1039,10 +1039,17 @@ class FilterMetaFunction : public MetaFunction {
if (args[0].kind() == Datum::RECORD_BATCH) {
auto values_batch = args[0].record_batch();
- ARROW_ASSIGN_OR_RAISE(
- std::shared_ptr<RecordBatch> out_batch,
- FilterRecordBatch(*args[0].record_batch(), args[1], options, ctx));
- return Datum(out_batch);
+ if (args[1].kind() == Datum::ARRAY) {
+ ARROW_ASSIGN_OR_RAISE(std::shared_ptr<RecordBatch> out_batch,
+ FilterRecordBatch(*values_batch, args[1],
options, ctx));
+ return Datum(out_batch);
+ } else {
Review Comment:
I'm not sure why we would want a filtering a RecordBatch to return a Table,
outcome of filtering will never be chunked even when the filter itself was
chunked.
Also, given that the filter is generally <= to the filtered array length, we
can probably take for granted that it will always be possible to concatenate
the filter when it's a chunked array. I think it can be benchmarked, but if I
had to guess I'd say that concatenating the chunks of the filter has a good
chance to be faster than FilterTable
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]