[jira] [Commented] (ARROW-3324) [Python] Users reporting memory leaks using pa.pq.ParquetDataset

Wes McKinney (JIRA) Fri, 26 Oct 2018 15:41:18 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16665729#comment-16665729
 ]


Wes McKinney commented on ARROW-3324:
-------------------------------------

Here's another memory leak report

{code}
import resource
import random
import string
import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd


def id_generator(size=6, chars=string.ascii_uppercase + string.digits):
    return ''.join(random.choice(chars) for _ in range(size))

schema = pa.schema([
                        pa.field('test', pa.string()),
                    ])

resource.setrlimit(resource.RLIMIT_NOFILE, (1000000, 1000000))
number_files = 10000
number_rows_increment = 1000
number_iterations = 100

writers = [pq.ParquetWriter('test_'+id_generator()+'.parquet', schema) for i in 
range(number_files)]

for i in range(number_iterations):
    for writer in writers:
        table_to_write = pa.Table.from_pandas(
                            pd.DataFrame({'test': [id_generator() for i in 
range(number_rows_increment)]}),
                            preserve_index=False,
                            schema = schema,
                            nthreads = 1)
        table_to_write = table_to_write.replace_schema_metadata(None)
        writer.write_table(table_to_write)
    print(i)

for writer in writers:
    writer.close()
{code}

https://stackoverflow.com/questions/53016802/memory-leak-from-pyarrow

> [Python] Users reporting memory leaks using pa.pq.ParquetDataset
> ----------------------------------------------------------------
>
>                 Key: ARROW-3324
>                 URL: https://issues.apache.org/jira/browse/ARROW-3324
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Wes McKinney
>            Priority: Major
>             Fix For: 0.12.0
>
>
> See:
> * https://github.com/apache/arrow/issues/2614
> * https://github.com/apache/arrow/issues/2624



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3324) [Python] Users reporting memory leaks using pa.pq.ParquetDataset

Reply via email to