Shuai Zhang created ARROW-13676:
-----------------------------------
Summary: [C++] Coredump writing Arrow table to Parquet file
Key: ARROW-13676
URL: https://issues.apache.org/jira/browse/ARROW-13676
Project: Apache Arrow
Issue Type: Improvement
Components: C++, Parquet
Affects Versions: 4.0.1, 5.0.0, 6.0.0
Reporter: Shuai Zhang
Attachments: callstack.txt
I'm suffering a random appeared coredump issue converting user data from Gogole
Protobuf format to Apache Parquet file via Apache Arrow C++ project. The
problem could be stable reproduced with ASAN check enabled for specified user
data. The callstack from ASAN check is exactly same as the coredump callstack
(posted in attachment file, compiled with apache-arrow-4.0.1 built without
jemalloc).
I made some initial investigations:
# The direct constructed Arrow table would trigger this issue. Clone it in
different way would yield different result, despite all of them are equal via
`table.Equals(other)` method.
## Serialize then deserialize the table was safe.
## CombineChunks didn't help.
## Clone with TableBatchReader didn't help.
## CombineChunks or TableBatchReader cloning on deserialized table was still
safe.
# Different environment would trigger this problem, I think the issue is not
related to glibc
## Debian 8 + gcc 4.9.2
## Debian 9 + gcc 6.3.0
## Debian 11 + gcc 10.2.1
## Ubuntu 20.04 LTS + clang 12.0.1
I still need to cleanup security policy sensitive codes & data to provide a way
to stable reproduce this issue. I'll come back a few days later.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)