Kun Liu created ARROW-6060: ------------------------------ Summary: too large memory cost using pyarrow.parquet.read_table with use_threads=True Key: ARROW-6060 URL: https://issues.apache.org/jira/browse/ARROW-6060 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.14.1 Reporter: Kun Liu
I tried to load a parquet file of about 1.8Gb using the following code. It crashed due to out of memory issue. {code:java} import pyarrow.parquet as pq pq.read_table('/tmp/test.parquet'){code} However, it worked well with use_threads=True as follows {code:java} pq.read_table('/tmp/test.parquet', use_threads=False){code} If pyarrow is downgraded to 0.12.1, there is no such problem. -- This message was sent by Atlassian JIRA (v7.6.14#76016)