[jira] [Commented] (ARROW-2801) [Python][C++][Dataset] Implement splt_row_groups for ParquetDataset

Wes McKinney (Jira) Fri, 10 Apr 2020 09:05:46 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17080605#comment-17080605
 ]


Wes McKinney commented on ARROW-2801:
-------------------------------------

If this is documented then this can be closed. Note the reporter's comment "An 
easy and efficient way to implement this is by using the summary metadata file 
instead of opening every footer file" -- so this is partially about reading 
large datasets more efficiently. We might wait until dataset generation from 
the _metadata file is possible

> [Python][C++][Dataset] Implement splt_row_groups for ParquetDataset
> -------------------------------------------------------------------
>
>                 Key: ARROW-2801
>                 URL: https://issues.apache.org/jira/browse/ARROW-2801
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Python
>            Reporter: Robbie Gruener
>            Priority: Minor
>              Labels: dataset, dataset-parquet-read, parquet, 
> pull-request-available
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently the split_row_groups argument in ParquetDataset yields a not 
> implemented error. An easy and efficient way to implement this is by using 
> the summary metadata file instead of opening every footer file



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-2801) [Python][C++][Dataset] Implement splt_row_groups for ParquetDataset

Reply via email to