[ https://issues.apache.org/jira/browse/ARROW-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030696#comment-17030696 ]
Joris Van den Bossche commented on ARROW-1796: ---------------------------------------------- I think we can close this issue, since this is now possible with the dataset API? (we can have a separate one about actually using this in {{pyarrow.parquet.read_table}} filter argument. > [Python] RowGroup filtering on file level > ----------------------------------------- > > Key: ARROW-1796 > URL: https://issues.apache.org/jira/browse/ARROW-1796 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ > Reporter: Uwe Korn > Assignee: Uwe Korn > Priority: Major > Labels: parquet, pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > We can build upon the API defined in {{fastparquet}} for defining RowGroup > filters: > https://github.com/dask/fastparquet/blob/master/fastparquet/api.py#L296-L300 > and translate them into the C++ enums we will define in > https://issues.apache.org/jira/browse/PARQUET-1158 . This should enable us to > provide the user with a simple predicate pushdown API that we can extend in > the background from RowGroup to Page level later on. -- This message was sent by Atlassian Jira (v8.3.4#803005)