Kun Liu created ARROW-6060:
------------------------------
Summary: too large memory cost using pyarrow.parquet.read_table
with use_threads=True
Key: ARROW-6060
URL: https://issues.apache.org/jira/browse/ARROW-6060
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 0.14.1
Reporter: Kun Liu
I tried to load a parquet file of about 1.8Gb using the following code. It
crashed due to out of memory issue.
{code:java}
import pyarrow.parquet as pq
pq.read_table('/tmp/test.parquet'){code}
However, it worked well with use_threads=True as follows
{code:java}
pq.read_table('/tmp/test.parquet', use_threads=False){code}
If pyarrow is downgraded to 0.12.1, there is no such problem.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)