[ https://issues.apache.org/jira/browse/ARROW-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506466#comment-16506466 ]
Aldrin commented on ARROW-2683: ------------------------------- For convenience, I have attached two files: * simple.parquet - parquet file containing simple pyarrow.Table data created with the following code: {noformat} import numpy import pyarrow from pyarrow import parquet table = pyarrow.Table.from_pandas( pandas.DataFrame({ 'A': range(5), 'B': [val * 5 for val in range(5)] }) ) parquet.write_table(table, 'simple.parquet') # sanity check parquet.read_table('simple.parquet'){noformat} * parquetread_test.py - python unittest that opens simple.parquet using pyarrow.parquet.read_table() > Resource Warning (Unclosed File) when using pyarrow.parquet.read_table() > ------------------------------------------------------------------------ > > Key: ARROW-2683 > URL: https://issues.apache.org/jira/browse/ARROW-2683 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.9.0 > Reporter: Aldrin > Priority: Major > Attachments: parquetread_test.py, simple.parquet > > > pyarrow version from python repl: > {noformat} > >>> import pyarrow > >>> pyarrow.__version__ > '0.9.0.post1'{noformat} > python interpreter information: > {noformat} > Python 3.6.5 (default, Mar 30 2018, 06:42:10) > [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin{noformat} > arbitrary, potentially relevant system information: > {noformat} > OS : macOS High Sierra (10.13.4) > homebrew package : python: stable 3.6.5 (bottled), devel 3.7.0b4, HEAD > pip version : pip 10.0.1 > pipenv version : pipenv, version 2018.05.18 > pyarrow version (via pip): pyarrow 0.9.0.post1 > cython version (via pip) : Cython 0.28.2{noformat} > > Issue Description: > I see a ResourceWarning, which doesn't seem to be an error, but seems > important enough (a.k.a. annoying enough) that I thought it would be worth > asking about. [~xhochy] was nice enough to respond in #general in the arrow > slack. > The main problem is as follows: > # with this code in a python unittest: > {noformat} > def test_arrow_from_parquet(self): > table = parquet.read_table(<path as str>){noformat} > I see this warning: > {noformat} > ResourceWarning: unclosed file <_io.BufferedReader > name=<path_to_file>{noformat} > # I tried adding the following, per Uwe's request: > {noformat} > warnings.simplefilter("error"){noformat} > # I then see this information: > {noformat} > test_arrow_from_parquet (tests.datalayer_test.TestFileReader) ... Exception > ignored in: <_io.FileIO name=<path_to_file> mode='rb' closefd=True> > ResourceWarning: unclosed file <_io.BufferedReader > name=<path_to_file>>{noformat} > # Uwe's thoughts: > {noformat} > That could be a valid error. We don’t seem to close the file we open in > `ParquetFile.__init__`{noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)