[
https://issues.apache.org/jira/browse/ARROW-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838442#comment-16838442
]
Joris Van den Bossche commented on ARROW-3424:
----------------------------------------------
Currently, a list of files is already supported in {{ParquetDataset}}. So
something like this (that would address the SO question, I think) works:
{code:java}
dataset = pq.ParquetDataset(['part0.parquet', 'part1.parquet'])
dataset.read_pandas().to_pandas()
{code}
Do we think that is enough support? (if so, this issue can be closed I think)
Or do we want to add this to {{pq.read_table}} ? (which eg also accepts a
directory name, which is then passed through to {{ParquetDataset}}. We could do
a similar pass through for a list of paths)
> [Python] Improved workflow for loading an arbitrary collection of Parquet
> files
> -------------------------------------------------------------------------------
>
> Key: ARROW-3424
> URL: https://issues.apache.org/jira/browse/ARROW-3424
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Reporter: Wes McKinney
> Priority: Major
> Labels: parquet
> Fix For: 0.14.0
>
>
> See SO question for use case:
> https://stackoverflow.com/questions/52613682/load-multiple-parquet-files-into-dataframe-for-analysis
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)