[
https://issues.apache.org/jira/browse/PARQUET-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726757#comment-16726757
]
Hatem Helal commented on PARQUET-1481:
--------------------------------------
Managed to reproduce this using a simple test using latest apache arrow.
Slightly nicer stack trace:
{{F1220 13:29:51.966117 2315707200 record_reader.cc:854] Check failed: false}}
{{*** Check failure stack trace: ***}}
{{ @ 0x1083c217a google::LogMessage::Fail()}}
{{ @ 0x1083c01de google::LogMessage::SendToLog()}}
{{ @ 0x1083c0e1f google::LogMessage::Flush()}}
{{ @ 0x1083c0c59 google::LogMessage::~LogMessage()}}
{{ @ 0x1083c0f15 google::LogMessage::~LogMessage()}}
{{ @ 0x10825d45c arrow::util::ArrowLog::~ArrowLog()}}
{{ @ 0x10825d4a5 arrow::util::ArrowLog::~ArrowLog()}}
{{ @ 0x107d5d936 parquet::internal::RecordReader::Make()}}
{{ @ 0x107cf8abd parquet::arrow::PrimitiveImpl::PrimitiveImpl()}}
{{ @ 0x107c69acd parquet::arrow::PrimitiveImpl::PrimitiveImpl()}}
{{ @ 0x107c68ba8 parquet::arrow::FileReader::Impl::GetColumn()}}
{{ @ 0x107c6b790 parquet::arrow::FileReader::Impl::GetReaderForNode()}}
{{ @ 0x107c6cb3d parquet::arrow::FileReader::Impl::ReadSchemaField()}}
{{ @ 0x107c79d60
parquet::arrow::FileReader::Impl::ReadTable()::$_1::operator()()}}
{{ @ 0x107c764ef parquet::arrow::FileReader::Impl::ReadTable()}}
{{ @ 0x107c7a9f5 parquet::arrow::FileReader::Impl::ReadTable()}}
{{ @ 0x107c7f5f7 parquet::arrow::FileReader::ReadTable()}}
{{ @ 0x107c6176c main}}
> [C++] SEGV when reading corrupt parquet file
> --------------------------------------------
>
> Key: PARQUET-1481
> URL: https://issues.apache.org/jira/browse/PARQUET-1481
> Project: Parquet
> Issue Type: Bug
> Reporter: Hatem Helal
> Assignee: Hatem Helal
> Priority: Major
> Attachments: corrupt.parquet
>
>
> >>> import pyarrow.parquet as pq
> >>> pq.read_table('corrupt.parquet')
> fish: 'python' terminated by signal SIGSEGV (Address boundary error)
>
> Stack report from macOS:
>
> 0 libsystem_kernel.dylib 0x00007fff51164cee __psynch_cvwait + 10
> 1 libsystem_pthread.dylib 0x00007fff512a1662 _pthread_cond_wait + 732
> 2 libc++.1.dylib 0x00007fff4f04acb0
> std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) +
> 18
> 3 libc++.1.dylib 0x00007fff4f04b728
> std::__1::__assoc_sub_state::__sub_wait(std::__1::unique_lock<std::__1::mutex>&)
> + 46
> 4 libparquet.11.dylib 0x0000000115512d00
> std::__1::__assoc_state<arrow::Status>::move() + 48
> 5 libparquet.11.dylib 0x00000001154faa15
> parquet::arrow::FileReader::Impl::ReadTable(std::__1::vector<int,
> std::__1::allocator<int> > const&, std::__1::shared_ptr<arrow::Table>*) + 1093
> 6 libparquet.11.dylib 0x00000001154fb6fe
> parquet::arrow::FileReader::Impl::ReadTable(std::__1::shared_ptr<arrow::Table>*)
> + 350
> 7 libparquet.11.dylib 0x00000001154fce47
> parquet::arrow::FileReader::ReadTable(std::__1::shared_ptr<arrow::Table>*) +
> 23
> 8 _parquet.so 0x000000011598d97b
> __pyx_pw_7pyarrow_8_parquet_13ParquetReader_9read_all(_object*, _object*,
> _object*) + 1035
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)