pitrou commented on code in PR #36906:
URL: https://github.com/apache/arrow/pull/36906#discussion_r1288578701
##########
cpp/src/arrow/compute/kernels/vector_selection_internal.cc:
##########
@@ -741,6 +741,63 @@ struct DenseUnionSelectionImpl
}
};
+struct SparseUnionSelectionImpl
+ : public Selection<SparseUnionSelectionImpl, SparseUnionType> {
+ using Base = Selection<SparseUnionSelectionImpl, SparseUnionType>;
+ LIFT_BASE_MEMBERS();
+
+ TypedBufferBuilder<int8_t> child_id_buffer_builder_;
+ std::vector<int8_t> type_codes_;
+
+ SparseUnionSelectionImpl(KernelContext* ctx, const ExecSpan& batch,
+ int64_t output_length, ExecResult* out)
+ : Base(ctx, batch, output_length, out),
+ child_id_buffer_builder_(ctx->memory_pool()),
+ type_codes_(checked_cast<const
UnionType&>(*this->values.type).type_codes()) {}
+
+ template <typename Adapter>
+ Status GenerateOutput() {
+ SparseUnionArray typed_values(this->values.ToArrayData());
+ Adapter adapter(this);
+ RETURN_NOT_OK(adapter.Generate(
+ [&](int64_t index) {
+ int8_t child_id = typed_values.child_id(index);
+ child_id_buffer_builder_.UnsafeAppend(type_codes_[child_id]);
Review Comment:
This seems to be doing a pointless back-and-forth between type codes and
child ids?
```suggestion
child_id_buffer_builder_.UnsafeAppend(typed_values.type_code(index));
```
##########
cpp/src/arrow/compute/kernels/vector_selection_internal.cc:
##########
@@ -863,6 +920,22 @@ Status DenseUnionFilterExec(KernelContext* ctx, const
ExecSpan& batch, ExecResul
return FilterExec<DenseUnionSelectionImpl>(ctx, batch, out);
}
+Status SparseUnionFilterExec(KernelContext* ctx, const ExecSpan& batch,
ExecResult* out) {
Review Comment:
Nit: move this into `vector_filter_internal.cc` along `StructFilterExec`?
(can probably also share some code between them...)
##########
docs/source/cpp/compute.rst:
##########
@@ -1680,28 +1680,26 @@ These functions select and return a subset of their
input.
+---------------+--------+--------------+--------------+--------------+-------------------------+-----------+
| Function name | Arity | Input type 1 | Input type 2 | Output type |
Options class | Notes |
+===============+========+==============+==============+==============+=========================+===========+
-| array_filter | Binary | Any | Boolean | Input type 1 |
:struct:`FilterOptions` | \(1) \(3) |
+| array_filter | Binary | Any | Boolean | Input type 1 |
:struct:`FilterOptions` | \(2) |
+---------------+--------+--------------+--------------+--------------+-------------------------+-----------+
-| array_take | Binary | Any | Integer | Input type 1 |
:struct:`TakeOptions` | \(1) \(4) |
+| array_take | Binary | Any | Integer | Input type 1 |
:struct:`TakeOptions` | \(3) |
+---------------+--------+--------------+--------------+--------------+-------------------------+-----------+
-| drop_null | Unary | Any | - | Input type 1 |
| \(1) \(2) |
+| drop_null | Unary | Any | - | Input type 1 |
| \(1) |
+---------------+--------+--------------+--------------+--------------+-------------------------+-----------+
-| filter | Binary | Any | Boolean | Input type 1 |
:struct:`FilterOptions` | \(1) \(3) |
+| filter | Binary | Any | Boolean | Input type 1 |
:struct:`FilterOptions` | \(3) |
+---------------+--------+--------------+--------------+--------------+-------------------------+-----------+
-| take | Binary | Any | Integer | Input type 1 |
:struct:`TakeOptions` | \(1) \(4) |
+| take | Binary | Any | Integer | Input type 1 |
:struct:`TakeOptions` | \(4) |
Review Comment:
I suppose this should be
```suggestion
| take | Binary | Any | Integer | Input type 1 |
:struct:`TakeOptions` | \(3) |
```
##########
cpp/src/arrow/compute/kernels/vector_selection_internal.cc:
##########
@@ -863,6 +920,22 @@ Status DenseUnionFilterExec(KernelContext* ctx, const
ExecSpan& batch, ExecResul
return FilterExec<DenseUnionSelectionImpl>(ctx, batch, out);
}
+Status SparseUnionFilterExec(KernelContext* ctx, const ExecSpan& batch,
ExecResult* out) {
+ // Transform filter to selection indices and then use Take.
+ std::shared_ptr<ArrayData> indices;
+ RETURN_NOT_OK(GetTakeIndices(batch[1].array,
+ FilterState::Get(ctx).null_selection_behavior,
+ ctx->memory_pool())
+ .Value(&indices));
+
+ Datum result;
+ RETURN_NOT_OK(Take(batch[0].array.ToArrayData(), Datum(indices),
Review Comment:
Perhaps we can call `SparseUnionTakeExec` directly instead of going through
the function lookup and execution machinery again?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]