jorisvandenbossche commented on issue #39311:
URL: https://github.com/apache/arrow/issues/39311#issuecomment-1864504639
For me there is a bit of disconnect between the issue title and description.
The title mentions a "take" kernel, but "take" is an existing vector kernel (to
take values from a table/array based on pre-computed indices). I don't you can
use the "take" kernel at the moment in an Aceor plan context (there is a
FilterNode that does a take under the hood, but that's selecting rows based on
a predicate _expression_, not based on pre-computed indices). But I assume this
is not the kind of "take" you mean here?
My understanding is that you rather want some scalar take-like kernel that
works on a list of elements and takes certain values from that list? So
essentially like the existing "list_element(lists, index)" kernel, but then
allowing multiple indices instead of only a single index?
The "list_element" almost works in the case you only want to select one
element of the list:
```python
import pyarrow as pa
from pyarrow.acero import Declaration, TableSourceNodeOptions,
ProjectNodeOptions
import pyarrow.compute as pc
table = pa.table({'a': [[1, 2, 3], [1, 2, 3], [1, 2, 3]], 'b': [0, 2, 1]})
decl = Declaration.from_sequence([
Declaration("table_source", TableSourceNodeOptions(table)),
Declaration("project",
ProjectNodeOptions([pc.list_element(pc.field("a"), pc.field("b"))]))
])
decl.to_table()
```
The only reason this doesn't work is because the "list_element" kernel isn't
yet implemented for the variant where the second argument (the index) is an
array instead of a scalar (i.e. constant index for all rows).
You could have something similar for a potential "list_take" kernel.
This does assume you first have computed those lists and indices. So that
means that you always first materialize the full list in a groupby step, and
only in a second step then take a subset of the list. I am not sure there would
be a way to integrate that in a single groupby call (without using a custom UDF
for the groupby aggregation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]