[
https://issues.apache.org/jira/browse/ARROW-10197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kirill Lykov updated ARROW-10197:
---------------------------------
Comment: was deleted
(was: I fix that and I see the code compiling but I get problems runtime:
Traceback (most recent call last):
File "bla.py", line 36, in <module>
r, = projector.evaluate(table.to_batches()[0], filterResult)
File "pyarrow/gandiva.pyx", line 156, in pyarrow.gandiva.Projector.evaluate
check_status(self.projector.get().Evaluate(
File "pyarrow/error.pxi", line 84, in pyarrow.lib.check_status
raise ArrowInvalid(message)
pyarrow.lib.ArrowInvalid: llvm expression built for selection vector mode 0
received vector with mode 2
Looks like it comes from this line –
[https://github.com/apache/arrow/blob/5d3dbd05ab48b22423cb4c1bda78da8d1f34b921/cpp/src/gandiva/llvm_generator.cc#L112]
Yet I don't understand enough the semantics of this error. Looks like something
is wrong with python test code)
> [Gandiva][python] Execute expression on filtered data
> -----------------------------------------------------
>
> Key: ARROW-10197
> URL: https://issues.apache.org/jira/browse/ARROW-10197
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++ - Gandiva, Python
> Reporter: Kirill Lykov
> Priority: Major
> Fix For: 3.0.0
>
>
> Looks like there is no way to execute an expression on filtered data in
> python.
> Basically, I cannot pass `SelectionVector` to projector's `evaluate` method
> ```python
> import pyarrow as pa
> import pyarrow.gandiva as gandiva
> table = pa.Table.from_arrays([pa.array([1., 31., 46., 3., 57., 44., 22.]),
> pa.array([5., 45., 36., 73.,
> 83., 23., 76.])],
> ['a', 'b'])
> builder = gandiva.TreeExprBuilder()
> node_a = builder.make_field(table.schema.field("a"))
> node_b = builder.make_field(table.schema.field("b"))
> fifty = builder.make_literal(50.0, pa.float64())
> eleven = builder.make_literal(11.0, pa.float64())
> cond_1 = builder.make_function("less_than", [node_a, fifty], pa.bool_())
> cond_2 = builder.make_function("greater_than", [node_a, node_b],
> pa.bool_())
> cond_3 = builder.make_function("less_than", [node_b, eleven], pa.bool_())
> cond = builder.make_or([builder.make_and([cond_1, cond_2]), cond_3])
> condition = builder.make_condition(cond)
> filter = gandiva.make_filter(table.schema, condition)
> filterResult = filter.evaluate(table.to_batches()[0],
> pa.default_memory_pool()) --> filterResult has type SelectionVector
> print(result)
> sum = builder.make_function("add", [node_a, node_b], pa.float64())
> field_result = pa.field("c", pa.float64())
> expr = builder.make_expression(sum, field_result)
> projector = gandiva.make_projector(
> table.schema, [expr], pa.default_memory_pool())
> r, = projector.evaluate(table.to_batches()[0], result) --> Here there is a
> problem that I don't know how to use filterResult with projector
> ```
> In C++, I see that it is possible to pass SelectionVector as second argument
> to projector::Evaluate:
> [https://github.com/apache/arrow/blob/c5fa23ea0e15abe47b35524fa6a79c7b8c160fa0/cpp/src/gandiva/tests/filter_project_test.cc#L270]
>
> Meanwhile, it looks like it is impossible in `gandiva.pyx`:
> [https://github.com/apache/arrow/blob/a4eb08d54ee0d4c0d0202fa0a2dfa8af7aad7a05/python/pyarrow/gandiva.pyx#L154]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)