[
https://issues.apache.org/jira/browse/ARROW-8039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17064894#comment-17064894
]
Joris Van den Bossche commented on ARROW-8039:
----------------------------------------------
I expanded my existing PR for {{read_table}}
(https://github.com/apache/arrow/pull/6303) with a small ParquetDataset shim,
that at least should have the basic {{ParquetDataset(..).read()}} work.
Right now I added a {{use_dataset=False/True}} keyword (with a default of
False), so you can opt in to use the new dataset API under the hood (and to
allow me to use this in the tests). But the final end user API we want to
provide for this should still be discussed.
> [C++][Python][Dataset] Assemble a minimal ParquetDataset shim
> -------------------------------------------------------------
>
> Key: ARROW-8039
> URL: https://issues.apache.org/jira/browse/ARROW-8039
> Project: Apache Arrow
> Issue Type: Sub-task
> Components: C++ - Dataset, Python
> Affects Versions: 0.16.0
> Reporter: Ben Kietzman
> Assignee: Ben Kietzman
> Priority: Major
> Fix For: 0.17.0
>
>
> Assemble a minimal ParquetDataset shim backed by {{pyarrow.dataset.*}}.
> Replace the existing ParquetDataset with the shim by default, allow opt-out
> for users who need the current ParquetDataset
> This is mostly exploratory to see which of the python tests fail
--
This message was sent by Atlassian Jira
(v8.3.4#803005)