[ 
https://issues.apache.org/jira/browse/ARROW-13763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17405094#comment-17405094
 ] 

Joris Van den Bossche edited comment on ARROW-13763 at 8/26/21, 9:30 AM:
-------------------------------------------------------------------------

For reference, the github issue where I already answered a bit: 
https://github.com/apache/arrow/issues/10965

In {{pyarrow.parquet.ParquetFile}}, we indeed don't close the file or have a 
{{close}} method to do this. The parquet reader seems to get RandomAccessFile 
handle created with {{ReadableFile}} to open the file (through creating a 
OSFile). The C++ ReadableFile also doesn't seem to have a public method to 
close it (there is a private {{DoClose}}, should that be made public so layers 
higher up can ensure to close the ReadableFile after using it?)


was (Author: jorisvandenbossche):
For reference, the github issue where I already answered a bit: 
https://github.com/apache/arrow/issues/10965

In {{ParquetFile}}, we indeed don't close the file or have a {{close}} method 
to do this. The parquet reader seems to get RandomAccessFile handle created 
with {{ReadableFile}} to open the file (through creating a OSFile). The C++ 
ReadableFile also doesn't seem to have a public method to close it (there is a 
private {{DoClose}}, should that be made public so layers higher up can ensure 
to close the ReadableFile after using it?)

> [Python] Files opened for read with pyarrow.parquet are not explicitly closed
> -----------------------------------------------------------------------------
>
>                 Key: ARROW-13763
>                 URL: https://issues.apache.org/jira/browse/ARROW-13763
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Parquet, Python
>    Affects Versions: 5.0.0
>         Environment: fsspec 2021.4.0
>            Reporter: Richard Kimoto
>            Priority: Major
>             Fix For: 6.0.0
>
>         Attachments: test.py
>
>
> It appears that files opened for read using pyarrow.parquet.read_table (and 
> therefore pyarrow.parquet.ParquetDataset) are not explicitly closed.  
> This seems to be the case for both use_legacy_dataset=True and False.  The 
> files don't remain open at the os level (verified using lsof).  They do 
> however seem to rely on the python gc to close.  
> My use case is that i'd like to use a custom fsspec filesystem that 
> interfaces to an s3 like API. It handles the remote download of the parquet 
> file and passes to pyarrow a handle of a temporary file downloaded locally.  
> It then is looking for an explicit close() or __exit__() to then clean up the 
> temp file.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to