[
https://issues.apache.org/jira/browse/ARROW-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16978256#comment-16978256
]
Joris Van den Bossche commented on ARROW-7208:
----------------------------------------------
Looking at the ParquetDataset docs
(https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetDataset.html),
it's indeed not clear how to read a part of it.
A ParquetDataset contains several "ParquetDatasetPiece"s, accessible as the
{{pieces}} attribute, and then you can read a single piece. But this part of
the API is not really documented. If you only want to read a single file of the
full directory, you can also create a {{ParquetFile}} but specify the full file
path instead of only the directory.
> [Python] Passing directory to ParquetFile class gives confusing error message
> -----------------------------------------------------------------------------
>
> Key: ARROW-7208
> URL: https://issues.apache.org/jira/browse/ARROW-7208
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.15.1
> Reporter: Roelant Stegmann
> Priority: Major
>
> Somehow have the same errors. We are working with pyarrow 0.15.1, trying to
> access a folder of `parquet` files generated with Amazon Athena.
> ```python
> table2 = pq.read_table('C:/Data/test-parquet')
> ```
> works fine in contrast to
> ```python
> parquet_file = pq.ParquetFile('C:/Data/test-parquet')
> # parquet_file.read_row_group(0)
> ```
> which raises
> `ArrowIOError: Failed to open local file 'C:/Data/test-parquet', error:
> Access is denied.`
--
This message was sent by Atlassian Jira
(v8.3.4#803005)