[jira] [Commented] (ARROW-1311) python hangs after write a few parquet tables

Keith Curtis (JIRA) Wed, 02 Aug 2017 15:11:31 -0700

    [ 
https://issues.apache.org/jira/browse/ARROW-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16111813#comment-16111813
 ]


Keith Curtis commented on ARROW-1311:
-------------------------------------

Hi,  I think I have the updated one:
  $ pip install --upgrade  pyarrow==0.5.*
Collecting pyarrow==0.5.*
  Downloading pyarrow-0.5.0.post1-cp35-cp35m-manylinux1_x86_64.whl (8.9MB)
  ...

I re-ran my script, but python appeared to hang, and the stack trace looks 
similar:

#0  je_spin_adaptive (spin=<synthetic pointer>) at 
include/jemalloc/internal/spin.h:40
#1  chunk_dss_max_update (new_addr=<optimized out>) at src/chunk_dss.c:83
#2  je_chunk_alloc_dss (tsdn=tsdn@entry=0x7f6d609ab620, 
arena=arena@entry=0x7f6ca8800140, new_addr=new_addr@entry=0x7f6c33000000, 
size=size@entry=8388608, 
    alignment=alignment@entry=2097152, zero=zero@entry=0x7fff45db9850, 
commit=commit@entry=0x7fff45db97a0) at src/chunk_dss.c:122
#3  0x00007f6ca92bb02f in chunk_alloc_core (dss_prec=dss_prec_secondary, 
commit=0x7fff45db97a0, zero=0x7fff45db9850, alignment=2097152, size=8388608, 
new_addr=0x7f6c33000000, 
    arena=0x7f6ca8800140, tsdn=0x7f6d609ab620) at src/chunk.c:357
#4  chunk_alloc_default_impl (commit=0x7fff45db97a0, zero=0x7fff45db9850, 
alignment=2097152, size=8388608, new_addr=0x7f6c33000000, arena=0x7f6ca8800140, 
tsdn=0x7f6d609ab620)
    at src/chunk.c:430
#5  je_chunk_alloc_wrapper (tsdn=tsdn@entry=0x7f6d609ab620, 
arena=arena@entry=0x7f6ca8800140, chunk_hooks=chunk_hooks@entry=0x7fff45db97c0, 
new_addr=new_addr@entry=0x7f6c33000000, 
    size=size@entry=8388608, alignment=2097152, sn=sn@entry=0x7fff45db97b0, 
zero=zero@entry=0x7fff45db9850, commit=commit@entry=0x7fff45db97a0) at 
src/chunk.c:490
 ...


> python hangs after write a few parquet tables
> ---------------------------------------------
>
>                 Key: ARROW-1311
>                 URL: https://issues.apache.org/jira/browse/ARROW-1311
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.5.0
>         Environment: Python 3.5.2, pyarrow 0.5.0
>            Reporter: Keith Curtis
>            Assignee: Wes McKinney
>             Fix For: 0.6.0
>
>         Attachments: backtrace.txt
>
>
> I had a program to read some csv files (a few million rows each, 9 columns), 
> and converted with:
> ```python
> import os
> import pandas as pd
> import pyarrow.parquet as pq
> import pyarrow
> def to_parquet(output_file, csv_file):
>     df = pd.read_csv(csv_file)
>     table = pyarrow.Table.from_pandas(df)
>     pq.write_table(table, output_file)
> ```
> The first csv file would always complete, but python would hang on the second 
> or third file, and sometimes on a much later file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (ARROW-1311) python hangs after write a few parquet tables

Reply via email to