Kun Liu created ARROW-6060:
------------------------------

             Summary: too large memory cost using pyarrow.parquet.read_table 
with use_threads=True
                 Key: ARROW-6060
                 URL: https://issues.apache.org/jira/browse/ARROW-6060
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.14.1
            Reporter: Kun Liu


 I tried to load a parquet file of about 1.8Gb using the following code. It 
crashed due to out of memory issue.
{code:java}
import pyarrow.parquet as pq
pq.read_table('/tmp/test.parquet'){code}
 However, it worked well with use_threads=True as follows
{code:java}
pq.read_table('/tmp/test.parquet', use_threads=False){code}
If pyarrow is downgraded to 0.12.1, there is no such problem.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to