[
https://issues.apache.org/jira/browse/ARROW-8039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056479#comment-17056479
]
Joris Van den Bossche commented on ARROW-8039:
----------------------------------------------
So a more specific comment here: if basically the only thing that such a
ParquetDataset would support is a "read()" function, I am not sure what the
benefit of such a ParquetDataset class would be compared to the
{{parquet.read_table}} function (which also supports reading with column
selection / row filter) (when designing a new API).
And supporting this in {{¶ead_table}} I actually already did in
https://github.com/apache/arrow/pull/6303
> [C++][Python][Dataset] Assemble a minimal ParquetDataset shim
> -------------------------------------------------------------
>
> Key: ARROW-8039
> URL: https://issues.apache.org/jira/browse/ARROW-8039
> Project: Apache Arrow
> Issue Type: Sub-task
> Components: C++ - Dataset, Python
> Affects Versions: 0.16.0
> Reporter: Ben Kietzman
> Assignee: Ben Kietzman
> Priority: Major
> Fix For: 0.17.0
>
>
> Assemble a minimal ParquetDataset shim backed by {{pyarrow.dataset.*}}.
> Replace the existing ParquetDataset with the shim by default, allow opt-out
> for users who need the current ParquetDataset
> This is mostly exploratory to see which of the python tests fail
--
This message was sent by Atlassian Jira
(v8.3.4#803005)