[jira] [Commented] (ARROW-8039) [C++][Python][Dataset] Assemble a minimal ParquetDataset shim

Neal Richardson (Jira) Tue, 10 Mar 2020 15:48:10 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-8039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056487#comment-17056487
 ]


Neal Richardson commented on ARROW-8039:
----------------------------------------

Ah, good call. That sounds reasonable to me (as someone who is not a user). And 
it looks like it is trivial enough to promote only read_table and not mention 
ParquetDataset in 
https://arrow.apache.org/docs/python/parquet.html#partitioned-datasets-multiple-files.
 

So the idea would be that read_table would be the function that gets the new 
Dataset option, and ParquetDataset would be unchanged (just no longer 
encouraged for use). 

[~wesm] thoughts?

> [C++][Python][Dataset] Assemble a minimal ParquetDataset shim
> -------------------------------------------------------------
>
>                 Key: ARROW-8039
>                 URL: https://issues.apache.org/jira/browse/ARROW-8039
>             Project: Apache Arrow
>          Issue Type: Sub-task
>          Components: C++ - Dataset, Python
>    Affects Versions: 0.16.0
>            Reporter: Ben Kietzman
>            Assignee: Ben Kietzman
>            Priority: Major
>             Fix For: 0.17.0
>
>
> Assemble a minimal ParquetDataset shim backed by {{pyarrow.dataset.*}}. 
> Replace the existing ParquetDataset with the shim by default, allow opt-out 
> for users who need the current ParquetDataset
> This is mostly exploratory to see which of the python tests fail



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8039) [C++][Python][Dataset] Assemble a minimal ParquetDataset shim

Reply via email to