[jira] [Commented] (ARROW-8039) [C++][Python][Dataset] Assemble a minimal ParquetDataset shim

Joris Van den Bossche (Jira) Tue, 10 Mar 2020 15:34:24 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-8039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056479#comment-17056479
 ]


Joris Van den Bossche commented on ARROW-8039:
----------------------------------------------

So a more specific comment here: if basically the only thing that such a 
ParquetDataset would support is a "read()" function, I am not sure what the 
benefit of such a ParquetDataset class would be compared to the  
{{parquet.read_table}} function (which also supports reading with column 
selection / row filter) (when designing a new API).

And supporting this in {{¶ead_table}} I actually already did in 
https://github.com/apache/arrow/pull/6303

> [C++][Python][Dataset] Assemble a minimal ParquetDataset shim
> -------------------------------------------------------------
>
>                 Key: ARROW-8039
>                 URL: https://issues.apache.org/jira/browse/ARROW-8039
>             Project: Apache Arrow
>          Issue Type: Sub-task
>          Components: C++ - Dataset, Python
>    Affects Versions: 0.16.0
>            Reporter: Ben Kietzman
>            Assignee: Ben Kietzman
>            Priority: Major
>             Fix For: 0.17.0
>
>
> Assemble a minimal ParquetDataset shim backed by {{pyarrow.dataset.*}}. 
> Replace the existing ParquetDataset with the shim by default, allow opt-out 
> for users who need the current ParquetDataset
> This is mostly exploratory to see which of the python tests fail



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8039) [C++][Python][Dataset] Assemble a minimal ParquetDataset shim

Reply via email to