[ https://issues.apache.org/jira/browse/ARROW-10197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208803#comment-17208803 ]
Kirill Lykov commented on ARROW-10197: -------------------------------------- I've tried to fix it by adding ```python def evaluate(self, RecordBatch batch, shared_ptr[CSelectionVector] selection): cdef vector[shared_ptr[CArray]] results check_status(self.projector.get().Evaluate( batch.sp_batch.get()[0], selection.get(), self.pool.pool, &results)) cdef shared_ptr[CArray] result arrays = [] for result in results: arrays.append(pyarrow_wrap_array(result)) return arrays ``` But I get error: Call with wrong number of arguments (expected 3, got 4) Which means that I don't understand how this pyx is translated to python. I thought this `self.projector.get().Evaluate` is somehow magically translated to the call of this method [https://github.com/apache/arrow/blob/7ad49eeca5215d9b2a56b6439f1bd6ea38888ea9/cpp/src/gandiva/projector.h#L106] > [Gandiva][python] Execute expression on filtered data > ----------------------------------------------------- > > Key: ARROW-10197 > URL: https://issues.apache.org/jira/browse/ARROW-10197 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Gandiva, Python > Reporter: Kirill Lykov > Priority: Trivial > > Looks like there is no way to execute an expression on filtered data in > python. > Basically, I cannot pass `SelectionVector` to projector's `evaluate` method > ```python > import pyarrow as pa > import pyarrow.gandiva as gandiva > table = pa.Table.from_arrays([pa.array([1., 31., 46., 3., 57., 44., 22.]), > pa.array([5., 45., 36., 73., > 83., 23., 76.])], > ['a', 'b']) > builder = gandiva.TreeExprBuilder() > node_a = builder.make_field(table.schema.field("a")) > node_b = builder.make_field(table.schema.field("b")) > fifty = builder.make_literal(50.0, pa.float64()) > eleven = builder.make_literal(11.0, pa.float64()) > cond_1 = builder.make_function("less_than", [node_a, fifty], pa.bool_()) > cond_2 = builder.make_function("greater_than", [node_a, node_b], > pa.bool_()) > cond_3 = builder.make_function("less_than", [node_b, eleven], pa.bool_()) > cond = builder.make_or([builder.make_and([cond_1, cond_2]), cond_3]) > condition = builder.make_condition(cond) > filter = gandiva.make_filter(table.schema, condition) > # filterResult has type SelectionVector > filterResult = filter.evaluate(table.to_batches()[0], > pa.default_memory_pool()) > print(result) > sum = builder.make_function("add", [node_a, node_b], pa.float64()) > field_result = pa.field("c", pa.float64()) > expr = builder.make_expression(sum, field_result) > projector = gandiva.make_projector( > table.schema, [expr], pa.default_memory_pool()) > ### Here there is a problem that I don't know how to use filterResult with > projector > r, = projector.evaluate(table.to_batches()[0], result) > ``` > In C++, I see that it is possible to pass SelectionVector as second argument > to projector::Evaluate: > [https://github.com/apache/arrow/blob/c5fa23ea0e15abe47b35524fa6a79c7b8c160fa0/cpp/src/gandiva/tests/filter_project_test.cc#L270] > > Meanwhile, it looks like it is impossible in `gandiva.pyx`: > [https://github.com/apache/arrow/blob/a4eb08d54ee0d4c0d0202fa0a2dfa8af7aad7a05/python/pyarrow/gandiva.pyx#L154] -- This message was sent by Atlassian Jira (v8.3.4#803005)