[
https://issues.apache.org/jira/browse/ARROW-13676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17402379#comment-17402379
]
Micah Kornfield commented on ARROW-13676:
-----------------------------------------
Hi Shuai, Thank you for the report. I think I know what the problem is.
> [C++] Coredump writing Arrow table to Parquet file
> --------------------------------------------------
>
> Key: ARROW-13676
> URL: https://issues.apache.org/jira/browse/ARROW-13676
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, Parquet
> Affects Versions: 5.0.0, 4.0.1, 6.0.0
> Reporter: Shuai Zhang
> Assignee: Micah Kornfield
> Priority: Critical
> Attachments: callstack.txt
>
>
> I'm suffering a random appeared coredump issue converting user data from
> Google Protobuf format to Apache Parquet file via Apache Arrow C++ project.
> The problem could be stable reproduced with ASAN check enabled for specified
> user data. The callstack from ASAN check is exactly same as the coredump
> callstack (posted in attachment file, compiled with apache-arrow-4.0.1 built
> without jemalloc).
> I made some initial investigations:
> # The direct constructed Arrow table would trigger this issue. Clone it in
> different way would yield different result, despite all of them are equal via
> `table.Equals(other)` method. All of the tables `ValidateFull()` passed.
> ## Serialize then deserialize the table was safe.
> ## CombineChunks didn't help.
> ## Clone with TableBatchReader didn't help.
> ## CombineChunks or TableBatchReader cloning on deserialized table was still
> safe.
> # Different environment would trigger this problem, I think the issue is not
> related to glibc
> ## Debian 8 + gcc 4.9.2
> ## Debian 9 + gcc 6.3.0
> ## Debian 11 + gcc 10.2.1
> ## Ubuntu 20.04 LTS + clang 12.0.1
> Reproducing this issue by
> https://github.com/hcoona/arrow/commit/8fa6cdb0c756c17ea3edc43b7b73c717823bda85
--
This message was sent by Atlassian Jira
(v8.3.4#803005)