[jira] [Commented] (ARROW-8039) [C++][Python][Dataset] Assemble a minimal ParquetDataset shim

Joris Van den Bossche (Jira) Mon, 23 Mar 2020 08:53:46 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-8039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17064894#comment-17064894
 ]


Joris Van den Bossche commented on ARROW-8039:
----------------------------------------------

I expanded my existing PR for {{read_table}} 
(https://github.com/apache/arrow/pull/6303) with a small ParquetDataset shim, 
that at least should have the basic {{ParquetDataset(..).read()}} work. 

Right now I added a {{use_dataset=False/True}} keyword (with a default of 
False), so you can opt in to use the new dataset API under the hood (and to 
allow me to use this in the tests). But the final end user API we want to 
provide for this should still be discussed.

> [C++][Python][Dataset] Assemble a minimal ParquetDataset shim
> -------------------------------------------------------------
>
>                 Key: ARROW-8039
>                 URL: https://issues.apache.org/jira/browse/ARROW-8039
>             Project: Apache Arrow
>          Issue Type: Sub-task
>          Components: C++ - Dataset, Python
>    Affects Versions: 0.16.0
>            Reporter: Ben Kietzman
>            Assignee: Ben Kietzman
>            Priority: Major
>             Fix For: 0.17.0
>
>
> Assemble a minimal ParquetDataset shim backed by {{pyarrow.dataset.*}}. 
> Replace the existing ParquetDataset with the shim by default, allow opt-out 
> for users who need the current ParquetDataset
> This is mostly exploratory to see which of the python tests fail



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8039) [C++][Python][Dataset] Assemble a minimal ParquetDataset shim

Reply via email to