[
https://issues.apache.org/jira/browse/ARROW-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506466#comment-16506466
]
Aldrin commented on ARROW-2683:
-------------------------------
For convenience, I have attached two files:
* simple.parquet - parquet file containing simple pyarrow.Table data created
with the following code:
{noformat}
import numpy
import pyarrow
from pyarrow import parquet
table = pyarrow.Table.from_pandas(
pandas.DataFrame({
'A': range(5),
'B': [val * 5 for val in range(5)]
})
)
parquet.write_table(table, 'simple.parquet')
# sanity check
parquet.read_table('simple.parquet'){noformat}
* parquetread_test.py - python unittest that opens simple.parquet using
pyarrow.parquet.read_table()
> Resource Warning (Unclosed File) when using pyarrow.parquet.read_table()
> ------------------------------------------------------------------------
>
> Key: ARROW-2683
> URL: https://issues.apache.org/jira/browse/ARROW-2683
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.9.0
> Reporter: Aldrin
> Priority: Major
> Attachments: parquetread_test.py, simple.parquet
>
>
> pyarrow version from python repl:
> {noformat}
> >>> import pyarrow
> >>> pyarrow.__version__
> '0.9.0.post1'{noformat}
> python interpreter information:
> {noformat}
> Python 3.6.5 (default, Mar 30 2018, 06:42:10)
> [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin{noformat}
> arbitrary, potentially relevant system information:
> {noformat}
> OS : macOS High Sierra (10.13.4)
> homebrew package : python: stable 3.6.5 (bottled), devel 3.7.0b4, HEAD
> pip version : pip 10.0.1
> pipenv version : pipenv, version 2018.05.18
> pyarrow version (via pip): pyarrow 0.9.0.post1
> cython version (via pip) : Cython 0.28.2{noformat}
>
> Issue Description:
> I see a ResourceWarning, which doesn't seem to be an error, but seems
> important enough (a.k.a. annoying enough) that I thought it would be worth
> asking about. [~xhochy] was nice enough to respond in #general in the arrow
> slack.
> The main problem is as follows:
> # with this code in a python unittest:
> {noformat}
> def test_arrow_from_parquet(self):
> table = parquet.read_table(<path as str>){noformat}
> I see this warning:
> {noformat}
> ResourceWarning: unclosed file <_io.BufferedReader
> name=<path_to_file>{noformat}
> # I tried adding the following, per Uwe's request:
> {noformat}
> warnings.simplefilter("error"){noformat}
> # I then see this information:
> {noformat}
> test_arrow_from_parquet (tests.datalayer_test.TestFileReader) ... Exception
> ignored in: <_io.FileIO name=<path_to_file> mode='rb' closefd=True>
> ResourceWarning: unclosed file <_io.BufferedReader
> name=<path_to_file>>{noformat}
> # Uwe's thoughts:
> {noformat}
> That could be a valid error. We don’t seem to close the file we open in
> `ParquetFile.__init__`{noformat}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)