[jira] [Commented] (ARROW-1017) Python: Table.to_pandas leaks memory

Wes McKinney (JIRA) Mon, 15 May 2017 06:00:19 -0700

    [ 
https://issues.apache.org/jira/browse/ARROW-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16010445#comment-16010445
 ]


Wes McKinney commented on ARROW-1017:
-------------------------------------

[~jporritt] the binaries on conda-forge have been updated to include this bug 
fix. We are working to get the 0.4.0 release ready this week, so that should be 
out officially in the next week or so

> Python: Table.to_pandas leaks memory
> ------------------------------------
>
>                 Key: ARROW-1017
>                 URL: https://issues.apache.org/jira/browse/ARROW-1017
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.3.0
>            Reporter: James Porritt
>            Assignee: Wes McKinney
>             Fix For: 0.4.0
>
>
> Running the following code results in ever increasing memory usage, even 
> though I would expect the dataframe to be garbage collected when it goes out 
> of scope. For the size of my parquet file, I see the usage increasing about 
> 3GB per loop:
> {code}
> from pyarrow import HdfsClient
> def read_parquet_file(client, parquet_file):
>     parquet = client.read_parquet(parquet_file)
>     df = parquet.to_pandas()
> client = HdfsClient("hdfshost", 8020, "myuser", driver='libhdfs3')
> parquet_file = '/my/parquet/file
> while True:
>     read_parquet_file(client, parquet_file)
> {code}
> Is there a reference count issue similar to ARROW-362?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (ARROW-1017) Python: Table.to_pandas leaks memory

Reply via email to