[jira] [Commented] (ARROW-1311) python hangs after write a few parquet tables

Wes McKinney (JIRA) Wed, 02 Aug 2017 15:23:22 -0700

    [ 
https://issues.apache.org/jira/browse/ARROW-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16111836#comment-16111836
 ]


Wes McKinney commented on ARROW-1311:
-------------------------------------

Cool, thank you! And very sorry about the trouble. We would have learned about 
these problems with jemalloc earlier but we only made it the default allocator 
in 0.5.0 so it's good to know so we can work with the jemalloc developers to 
figure out what's wrong

> python hangs after write a few parquet tables
> ---------------------------------------------
>
>                 Key: ARROW-1311
>                 URL: https://issues.apache.org/jira/browse/ARROW-1311
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.5.0
>         Environment: Python 3.5.2, pyarrow 0.5.0
>            Reporter: Keith Curtis
>            Assignee: Wes McKinney
>             Fix For: 0.6.0
>
>         Attachments: backtrace.txt
>
>
> I had a program to read some csv files (a few million rows each, 9 columns), 
> and converted with:
> ```python
> import os
> import pandas as pd
> import pyarrow.parquet as pq
> import pyarrow
> def to_parquet(output_file, csv_file):
>     df = pd.read_csv(csv_file)
>     table = pyarrow.Table.from_pandas(df)
>     pq.write_table(table, output_file)
> ```
> The first csv file would always complete, but python would hang on the second 
> or third file, and sometimes on a much later file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (ARROW-1311) python hangs after write a few parquet tables

Reply via email to