[
https://issues.apache.org/jira/browse/ARROW-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16396514#comment-16396514
]
Wes McKinney commented on ARROW-2082:
-------------------------------------
Here's the backtrace for this:
{code}
#0 0x00007fffece34769 in arrow::PoolBuffer::Reserve (this=0x139c180,
capacity=1024) at ../src/arrow/buffer.cc:101
#1 0x00007fffece34b2f in arrow::PoolBuffer::Resize (this=0x139c180,
new_size=1024, shrink_to_fit=true) at ../src/arrow/buffer.cc:112
#2 0x00007fffcb5fc506 in parquet::AllocateBuffer (pool=0x7fffed519300
<completed>, size=1024) at ../src/parquet/util/memory.cc:501
#3 0x00007fffcb5fc75e in parquet::InMemoryOutputStream::InMemoryOutputStream
(this=0x1487090, pool=0x7fffed519300 <completed>, initial_capacity=1024) at
../src/parquet/util/memory.cc:423
#4 0x00007fffcb5335ca in
parquet::PlainEncoder<parquet::DataType<(parquet::Type::type)2> >::PlainEncoder
(this=0x7fffffff9170, descr=0x1104060, pool=0x7fffed519300 <completed>)
at ../src/parquet/encoding-internal.h:188
#5 0x00007fffcb5defa2 in
parquet::TypedRowGroupStatistics<parquet::DataType<(parquet::Type::type)2>
>::PlainEncode (this=0xbbee60, src=@0xbbeec8: -729020189051312384,
dst=0x7fffffff9258)
at ../src/parquet/statistics.cc:228
#6 0x00007fffcb5def07 in
parquet::TypedRowGroupStatistics<parquet::DataType<(parquet::Type::type)2>
>::EncodeMin (this=0xbbee60) at ../src/parquet/statistics.cc:204
#7 0x00007fffcb5df1c3 in
parquet::TypedRowGroupStatistics<parquet::DataType<(parquet::Type::type)2>
>::Encode (this=0xbbee60) at ../src/parquet/statistics.cc:219
#8 0x00007fffcb5348f7 in
parquet::TypedColumnWriter<parquet::DataType<(parquet::Type::type)2>
>::GetPageStatistics (this=0x81d2b0) at ../src/parquet/column_writer.cc:520
#9 0x00007fffcb52ca76 in parquet::ColumnWriter::AddDataPage (this=0x81d2b0) at
../src/parquet/column_writer.cc:386
#10 0x00007fffcb52c0eb in parquet::ColumnWriter::FlushBufferedDataPages
(this=0x81d2b0) at ../src/parquet/column_writer.cc:447
#11 0x00007fffcb52ddb0 in parquet::ColumnWriter::Close (this=0x81d2b0) at
../src/parquet/column_writer.cc:431
#12 0x00007fffcb4d6657 in parquet::arrow::(anonymous
namespace)::ArrowColumnWriter::Close (this=0x7fffffff9b48) at
../src/parquet/arrow/writer.cc:347
#13 0x00007fffcb4e758e in parquet::arrow::FileWriter::Impl::WriteColumnChunk
(this=0x15adee0, data=warning: RTTI symbol not found for class
'std::_Sp_counted_ptr_inplace<arrow::ChunkedArray,
std::allocator<arrow::ChunkedArray>, (__gnu_cxx::_Lock_policy)2>'
warning: RTTI symbol not found for class
'std::_Sp_counted_ptr_inplace<arrow::ChunkedArray,
std::allocator<arrow::ChunkedArray>, (__gnu_cxx::_Lock_policy)2>'
std::shared_ptr (count 2, weak 0) 0x1717cc0, offset=0, size=5)
at ../src/parquet/arrow/writer.cc:982
#14 0x00007fffcb4d507b in parquet::arrow::FileWriter::WriteColumnChunk
(this=0x125bc30, data=warning: RTTI symbol not found for class
'std::_Sp_counted_ptr_inplace<arrow::ChunkedArray,
std::allocator<arrow::ChunkedArray>, (__gnu_cxx::_Lock_policy)2>'
warning: RTTI symbol not found for class
'std::_Sp_counted_ptr_inplace<arrow::ChunkedArray,
std::allocator<arrow::ChunkedArray>, (__gnu_cxx::_Lock_policy)2>'
std::shared_ptr (count 2, weak 0) 0x1717cc0, offset=0, size=5)
at ../src/parquet/arrow/writer.cc:1011
#15 0x00007fffcb4d5ba6 in parquet::arrow::FileWriter::WriteTable
(this=0x125bc30, table=..., chunk_size=5) at ../src/parquet/arrow/writer.cc:1086
{code}
Not sure what's going wrong yet
> [Python] SegFault in pyarrow.parquet.write_table with specific options
> ----------------------------------------------------------------------
>
> Key: ARROW-2082
> URL: https://issues.apache.org/jira/browse/ARROW-2082
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.8.0
> Environment: tested on MacOS High Sierra with python 3.6 and Ubuntu
> Xenial (Python 3.5)
> Reporter: Clément Bouscasse
> Priority: Major
> Fix For: 0.9.0
>
>
> I originally filed an issue in the pandas project but we've tracked it down
> to arrow itself, when called via pandas in specific circumstances:
> [https://github.com/pandas-dev/pandas/issues/19493]
> basically using
> {code:java}
> df.to_parquet('filename.parquet', flavor='spark'){code}
> gives a seg fault if `df` contains a datetime column.
> Under the covers, pandas translates this to the following call:
> {code:java}
> pq.write_table(table, 'output.parquet', flavor='spark', compression='snappy',
> coerce_timestamps='ms')
> {code}
> which gives me an instant crash.
> There is a repro on the github ticket.
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)