[
https://issues.apache.org/jira/browse/ARROW-13763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17405083#comment-17405083
]
Antoine Pitrou commented on ARROW-13763:
----------------------------------------
Thanks for the report. It seems that, when a file or directory path is given
(as opposed to an open file object), Arrow should explicitly close all files it
opens by itself.
Some of this may be in the C++ dataset layer, some of this in the Python
Parquet wrapper.
> [Python] Files opened for read with pyarrow.parquet are not explicitly closed
> -----------------------------------------------------------------------------
>
> Key: ARROW-13763
> URL: https://issues.apache.org/jira/browse/ARROW-13763
> Project: Apache Arrow
> Issue Type: Bug
> Components: Parquet, Python
> Affects Versions: 5.0.0
> Environment: fsspec 2021.4.0
> Reporter: Richard Kimoto
> Priority: Major
> Attachments: test.py
>
>
> It appears that files opened for read using pyarrow.parquet.read_table (and
> therefore pyarrow.parquet.ParquetDataset) are not explicitly closed.
> This seems to be the case for both use_legacy_dataset=True and False. The
> files don't remain open at the os level (verified using lsof). They do
> however seem to rely on the python gc to close.
> My use case is that i'd like to use a custom fsspec filesystem that
> interfaces to an s3 like API. It handles the remote download of the parquet
> file and passes to pyarrow a handle of a temporary file downloaded locally.
> It then is looking for an explicit close() or __exit__() to then clean up the
> temp file.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)