pitrou commented on code in PR #36906:
URL: https://github.com/apache/arrow/pull/36906#discussion_r1288578701


##########
cpp/src/arrow/compute/kernels/vector_selection_internal.cc:
##########
@@ -741,6 +741,63 @@ struct DenseUnionSelectionImpl
   }
 };
 
+struct SparseUnionSelectionImpl
+    : public Selection<SparseUnionSelectionImpl, SparseUnionType> {
+  using Base = Selection<SparseUnionSelectionImpl, SparseUnionType>;
+  LIFT_BASE_MEMBERS();
+
+  TypedBufferBuilder<int8_t> child_id_buffer_builder_;
+  std::vector<int8_t> type_codes_;
+
+  SparseUnionSelectionImpl(KernelContext* ctx, const ExecSpan& batch,
+                           int64_t output_length, ExecResult* out)
+      : Base(ctx, batch, output_length, out),
+        child_id_buffer_builder_(ctx->memory_pool()),
+        type_codes_(checked_cast<const 
UnionType&>(*this->values.type).type_codes()) {}
+
+  template <typename Adapter>
+  Status GenerateOutput() {
+    SparseUnionArray typed_values(this->values.ToArrayData());
+    Adapter adapter(this);
+    RETURN_NOT_OK(adapter.Generate(
+        [&](int64_t index) {
+          int8_t child_id = typed_values.child_id(index);
+          child_id_buffer_builder_.UnsafeAppend(type_codes_[child_id]);

Review Comment:
   This seems to be doing a pointless back-and-forth between type codes and 
child ids?
   ```suggestion
             
child_id_buffer_builder_.UnsafeAppend(typed_values.type_code(index));
   ```



##########
cpp/src/arrow/compute/kernels/vector_selection_internal.cc:
##########
@@ -863,6 +920,22 @@ Status DenseUnionFilterExec(KernelContext* ctx, const 
ExecSpan& batch, ExecResul
   return FilterExec<DenseUnionSelectionImpl>(ctx, batch, out);
 }
 
+Status SparseUnionFilterExec(KernelContext* ctx, const ExecSpan& batch, 
ExecResult* out) {

Review Comment:
   Nit: move this into `vector_filter_internal.cc` along `StructFilterExec`? 
(can probably also share some code between them...)



##########
docs/source/cpp/compute.rst:
##########
@@ -1680,28 +1680,26 @@ These functions select and return a subset of their 
input.
 
+---------------+--------+--------------+--------------+--------------+-------------------------+-----------+
 | Function name | Arity  | Input type 1 | Input type 2 | Output type  | 
Options class           | Notes     |
 
+===============+========+==============+==============+==============+=========================+===========+
-| array_filter  | Binary | Any          | Boolean      | Input type 1 | 
:struct:`FilterOptions` | \(1) \(3) |
+| array_filter  | Binary | Any          | Boolean      | Input type 1 | 
:struct:`FilterOptions` | \(2)      |
 
+---------------+--------+--------------+--------------+--------------+-------------------------+-----------+
-| array_take    | Binary | Any          | Integer      | Input type 1 | 
:struct:`TakeOptions`   | \(1) \(4) |
+| array_take    | Binary | Any          | Integer      | Input type 1 | 
:struct:`TakeOptions`   | \(3)      |
 
+---------------+--------+--------------+--------------+--------------+-------------------------+-----------+
-| drop_null     | Unary  | Any          | -            | Input type 1 |        
                 | \(1) \(2) |
+| drop_null     | Unary  | Any          | -            | Input type 1 |        
                 | \(1)      |
 
+---------------+--------+--------------+--------------+--------------+-------------------------+-----------+
-| filter        | Binary | Any          | Boolean      | Input type 1 | 
:struct:`FilterOptions` | \(1) \(3) |
+| filter        | Binary | Any          | Boolean      | Input type 1 | 
:struct:`FilterOptions` | \(3)      |
 
+---------------+--------+--------------+--------------+--------------+-------------------------+-----------+
-| take          | Binary | Any          | Integer      | Input type 1 | 
:struct:`TakeOptions`   | \(1) \(4) |
+| take          | Binary | Any          | Integer      | Input type 1 | 
:struct:`TakeOptions`   | \(4)      |

Review Comment:
   I suppose this should be
   ```suggestion
   | take          | Binary | Any          | Integer      | Input type 1 | 
:struct:`TakeOptions`   | \(3)      |
   ```



##########
cpp/src/arrow/compute/kernels/vector_selection_internal.cc:
##########
@@ -863,6 +920,22 @@ Status DenseUnionFilterExec(KernelContext* ctx, const 
ExecSpan& batch, ExecResul
   return FilterExec<DenseUnionSelectionImpl>(ctx, batch, out);
 }
 
+Status SparseUnionFilterExec(KernelContext* ctx, const ExecSpan& batch, 
ExecResult* out) {
+  // Transform filter to selection indices and then use Take.
+  std::shared_ptr<ArrayData> indices;
+  RETURN_NOT_OK(GetTakeIndices(batch[1].array,
+                               FilterState::Get(ctx).null_selection_behavior,
+                               ctx->memory_pool())
+                    .Value(&indices));
+
+  Datum result;
+  RETURN_NOT_OK(Take(batch[0].array.ToArrayData(), Datum(indices),

Review Comment:
   Perhaps we can call `SparseUnionTakeExec` directly instead of going through 
the function lookup and execution machinery again?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to