[ https://issues.apache.org/jira/browse/ARROW-12060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17406591#comment-17406591 ]
Joris Van den Bossche commented on ARROW-12060: ----------------------------------------------- A quick demo of what a public "Expression.call" currently gives, using a compute kernel (log10) that is not directly exposed in the dataset Expression class: {code:python} import pyarrow.dataset as ds table = pa.table({'a': range(10)}) ds.write_dataset(table, "test_dataset_expression.parquet", format="feather") dataset = ds.dataset("test_dataset_expression.parquet/", format="feather") >>> f = ds.field("a") # creating expressions >>> ds.Expression.call("log", [f]) <pyarrow.dataset.Expression log(a)> >>> ds.Expression.call("log", [f]) > 1 <pyarrow.dataset.Expression (log(a) > 1)> # using it to project/filter datasets >>> dataset.to_table(columns={'a': ds.field("a"), 'a_log': >>> ds.Expression.call("log10", [ds.field('a')])}).to_pandas() a a_log 0 0 -inf 1 1 0.000000 2 2 0.301030 3 3 0.477121 4 4 0.602060 5 5 0.698970 6 6 0.778151 7 7 0.845098 8 8 0.903090 9 9 0.954243 >>> dataset.to_table(columns={'a': ds.field("a"), 'a_log': >>> ds.Expression.call("log10", [ds.field('a')])}, >>> filter=ds.Expression.call("log10", [ds.field('a')]) > 0.5).to_pandas() a a_log 0 4 0.602060 1 5 0.698970 2 6 0.778151 3 7 0.845098 4 8 0.903090 5 9 0.954243 {code} So that seems to work to use a compute kernel in pyarrow.dataset. However, it doesn't give a super nice user experience: it basically gives the equivalent of {{pc.call_function(..)}}, so eg {{pc.call_function("log10, [...])}} instead of {{pc.log10(...)}}. That also means that several of the niceties of the python wrapper functions are not available (e.g. validation of some arguments and not having to pass them as a list, passing options as keyword instead of the class, etc). I think ideally we would be able to use the compute wrappers directly? Like {{pc.log10(ds.field('a'))}} ? Or what would be our preferred user API? > [Python] Enable calling compute functions on Expressions > -------------------------------------------------------- > > Key: ARROW-12060 > URL: https://issues.apache.org/jira/browse/ARROW-12060 > Project: Apache Arrow > Issue Type: Sub-task > Components: Python > Reporter: Joris Van den Bossche > Assignee: Joris Van den Bossche > Priority: Major > Fix For: 6.0.0 > > > To expose the full power of dataset (projection/filter) expressions, we > should ensure that all compute kernels can be used in combination with > expressions. -- This message was sent by Atlassian Jira (v8.3.4#803005)