felipecrv commented on issue #41094:
URL: https://github.com/apache/arrow/issues/41094#issuecomment-2150787705
> I think both 2 and 3 could potentially benefit from leveraging special
attributes of specific data types such as list/string-view, ree and dict,
though I'm not exactly sure how.
The leverage is always a function of type+kernel. The first kernels that
deserve good specialization are `array_take` (the "gather") and the scatter.
For types that support out-of-order writing (list-view, dict,
string-view...) you can scatter incrementally:
```cpp
scatter(branch0, sel0, &output)
scatter(branch1, sel1, &output);
...
scatter(branchn, seln, &output);
```
(this assumes all the selection vectors are disjoint)
For types that need in-order appending, you will need all selection vectors
and merge them:
```cpp
selections = MinHeapOfSelections{{branch0, sel0, 0}, {branch1, sel1, 0},
..., {branchn, seln, 0}};
while (!selections.empty()) {
i = min_selection(selections);
output_builder.AppendFrom(selections[i].branch, selections[i].pos);
selections.ExtractMin(/*hint=*/i);
}
```
These branches and selection vectors are described in this comment about
evaluation of case-when
https://github.com/apache/arrow/issues/41453#issuecomment-2150735331
> I'm now working on an overall framework, maybe things will become clearer
when I get there. I can use some help/comment from you guys then :)
The details will become clear only when you try to draft a big change for
sure, but for the sake of review an interesting first step would be the
representation of the `cond` special form in `compute::Expression` with
strict/eager evaluation that you later replace.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]