[
https://issues.apache.org/jira/browse/ARROW-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16111813#comment-16111813
]
Keith Curtis commented on ARROW-1311:
-------------------------------------
Hi, I think I have the updated one:
$ pip install --upgrade pyarrow==0.5.*
Collecting pyarrow==0.5.*
Downloading pyarrow-0.5.0.post1-cp35-cp35m-manylinux1_x86_64.whl (8.9MB)
...
I re-ran my script, but python appeared to hang, and the stack trace looks
similar:
#0 je_spin_adaptive (spin=<synthetic pointer>) at
include/jemalloc/internal/spin.h:40
#1 chunk_dss_max_update (new_addr=<optimized out>) at src/chunk_dss.c:83
#2 je_chunk_alloc_dss (tsdn=tsdn@entry=0x7f6d609ab620,
arena=arena@entry=0x7f6ca8800140, new_addr=new_addr@entry=0x7f6c33000000,
size=size@entry=8388608,
alignment=alignment@entry=2097152, zero=zero@entry=0x7fff45db9850,
commit=commit@entry=0x7fff45db97a0) at src/chunk_dss.c:122
#3 0x00007f6ca92bb02f in chunk_alloc_core (dss_prec=dss_prec_secondary,
commit=0x7fff45db97a0, zero=0x7fff45db9850, alignment=2097152, size=8388608,
new_addr=0x7f6c33000000,
arena=0x7f6ca8800140, tsdn=0x7f6d609ab620) at src/chunk.c:357
#4 chunk_alloc_default_impl (commit=0x7fff45db97a0, zero=0x7fff45db9850,
alignment=2097152, size=8388608, new_addr=0x7f6c33000000, arena=0x7f6ca8800140,
tsdn=0x7f6d609ab620)
at src/chunk.c:430
#5 je_chunk_alloc_wrapper (tsdn=tsdn@entry=0x7f6d609ab620,
arena=arena@entry=0x7f6ca8800140, chunk_hooks=chunk_hooks@entry=0x7fff45db97c0,
new_addr=new_addr@entry=0x7f6c33000000,
size=size@entry=8388608, alignment=2097152, sn=sn@entry=0x7fff45db97b0,
zero=zero@entry=0x7fff45db9850, commit=commit@entry=0x7fff45db97a0) at
src/chunk.c:490
...
> python hangs after write a few parquet tables
> ---------------------------------------------
>
> Key: ARROW-1311
> URL: https://issues.apache.org/jira/browse/ARROW-1311
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.5.0
> Environment: Python 3.5.2, pyarrow 0.5.0
> Reporter: Keith Curtis
> Assignee: Wes McKinney
> Fix For: 0.6.0
>
> Attachments: backtrace.txt
>
>
> I had a program to read some csv files (a few million rows each, 9 columns),
> and converted with:
> ```python
> import os
> import pandas as pd
> import pyarrow.parquet as pq
> import pyarrow
> def to_parquet(output_file, csv_file):
> df = pd.read_csv(csv_file)
> table = pyarrow.Table.from_pandas(df)
> pq.write_table(table, output_file)
> ```
> The first csv file would always complete, but python would hang on the second
> or third file, and sometimes on a much later file.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)