Shuai Zhang created ARROW-13676:
-----------------------------------

             Summary: [C++] Coredump writing Arrow table to Parquet file
                 Key: ARROW-13676
                 URL: https://issues.apache.org/jira/browse/ARROW-13676
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++, Parquet
    Affects Versions: 4.0.1, 5.0.0, 6.0.0
            Reporter: Shuai Zhang
         Attachments: callstack.txt

I'm suffering a random appeared coredump issue converting user data from Gogole 
Protobuf format to Apache Parquet file via Apache Arrow C++ project. The 
problem could be stable reproduced with ASAN check enabled for specified user 
data. The callstack from ASAN check is exactly same as the coredump callstack 
(posted in attachment file, compiled with apache-arrow-4.0.1 built without 
jemalloc). 

I made some initial investigations:

# The direct constructed Arrow table would trigger this issue. Clone it in 
different way would yield different result, despite all of them are equal via 
`table.Equals(other)` method.
## Serialize then deserialize the table was safe.
## CombineChunks didn't help.
## Clone with TableBatchReader didn't help.
## CombineChunks or TableBatchReader cloning on deserialized table was still 
safe.
# Different environment would trigger this problem, I think the issue is not 
related to glibc
## Debian 8 + gcc 4.9.2
## Debian 9 + gcc 6.3.0
## Debian 11 + gcc 10.2.1
## Ubuntu 20.04 LTS + clang 12.0.1

I still need to cleanup security policy sensitive codes & data to provide a way 
to stable reproduce this issue. I'll come back a few days later.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to