[
https://issues.apache.org/jira/browse/ARROW-15664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Keane closed ARROW-15664.
----------------------------------
Resolution: Duplicate
> [C++] parquet reader Segfaults with illegal SIMD instruction
> -------------------------------------------------------------
>
> Key: ARROW-15664
> URL: https://issues.apache.org/jira/browse/ARROW-15664
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++
> Affects Versions: 7.0.0
> Reporter: Jonathan Keane
> Priority: Critical
> Fix For: 7.0.1, 8.0.0
>
>
> When compiling with {{-Os}} (or with release type {{MinRelSize}}), and we run
> parquet tests (in R at least, though I imagine the pyarrow and C++ will have
> the same issues!) we get a segfault with an illegal opcode on systems that
> don't have BMI2 available when trying to read parquet files. (It turns out,
> the github runners for macos don't have BMI2, so this is easily testable
> there!)
> Somehow in the optimization combined with the way our runtime detection code
> works, the runtime detection we normally use for this fails (though it works
> just fine with {{-O2}}, {{-O3}}, etc.).
> When diagnosing this, I created a branch + PR that runs our R tests after
> installing from brew which can reliably cause this to happen:
> https://github.com/apache/arrow/pull/12364 other test suites that exercise
> parquet reading would probably have the same issue (or even C++ tests built
> with {{-Os}}.
> Here's a coredump:
> {code}
> 2491 Thread_829819
> + 2491 thread_start (in libsystem_pthread.dylib) + 15 [0x7ff801c3e00f]
> + 2491 _pthread_start (in libsystem_pthread.dylib) + 125 [0x7ff801c424f4]
> + 2491 void*
> std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct,
> std::__1::default_delete<std::__1::__thread_struct> >,
> arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_3> >(void*) (in
> arrow.so) + 380 [0x109203749]
> + 2491 arrow::internal::FnOnce<void ()>::operator()() && (in arrow.so)
> + 26 [0x109201f30]
> + 2491 arrow::internal::FnOnce<void
> ()>::FnImpl<std::__1::__bind<arrow::detail::ContinueFuture,
> arrow::Future<std::__1::shared_ptr<arrow::ChunkedArray> >&,
> parquet::arrow::(anonymous
> namespace)::FileReaderImpl::DecodeRowGroups(std::__1::shared_ptr<parquet::arrow::(anonymous
> namespace)::FileReaderImpl>, std::__1::vector<int, std::__1::allocator<int>
> > const&, std::__1::vector<int, std::__1::allocator<int> > const&,
> arrow::internal::Executor*)::$_4&, unsigned long&,
> std::__1::shared_ptr<parquet::arrow::ColumnReaderImpl> > >::invoke() (in
> arrow.so) + 98 [0x108f125c2]
> + 2491 parquet::arrow::(anonymous
> namespace)::FileReaderImpl::DecodeRowGroups(std::__1::shared_ptr<parquet::arrow::(anonymous
> namespace)::FileReaderImpl>, std::__1::vector<int, std::__1::allocator<int>
> > const&, std::__1::vector<int, std::__1::allocator<int> > const&,
> arrow::internal::Executor*)::$_4::operator()(unsigned long,
> std::__1::shared_ptr<parquet::arrow::ColumnReaderImpl>) const (in arrow.so)
> + 47 [0x108f11ed5]
> + 2491 parquet::arrow::(anonymous
> namespace)::FileReaderImpl::ReadColumn(int, std::__1::vector<int,
> std::__1::allocator<int> > const&, parquet::arrow::ColumnReader*,
> std::__1::shared_ptr<arrow::ChunkedArray>*) (in arrow.so) + 273
> [0x108f0c037]
> + 2491 parquet::arrow::ColumnReaderImpl::NextBatch(long long,
> std::__1::shared_ptr<arrow::ChunkedArray>*) (in arrow.so) + 39 [0x108f0733b]
> + 2491 parquet::arrow::(anonymous
> namespace)::LeafReader::LoadBatch(long long) (in arrow.so) + 137
> [0x108f0794b]
> + 2491 parquet::internal::(anonymous
> namespace)::TypedRecordReader<parquet::PhysicalType<(parquet::Type::type)1>
> >::ReadRecords(long long) (in arrow.so) + 442 [0x108f4f53e]
> + 2491 parquet::internal::(anonymous
> namespace)::TypedRecordReader<parquet::PhysicalType<(parquet::Type::type)1>
> >::ReadRecordData(long long) (in arrow.so) + 471 [0x108f50503]
> + 2491 void
> parquet::internal::standard::DefLevelsToBitmapSimd<false>(short const*, long
> long, parquet::internal::LevelInfo,
> parquet::internal::ValidityBitmapInputOutput*) (in arrow.so) + 250
> [0x108fc2a5a]
> + 2491 long long
> parquet::internal::standard::DefLevelsBatchToBitmap<false>(short const*, long
> long, long long, parquet::internal::LevelInfo,
> arrow::internal::FirstTimeBitmapWriter*) (in arrow.so) + 63 [0x108fc34da]
> + 2491 ??? (in <unknown binary>) [0x600001354518]
> + 2491 _sigtramp (in libsystem_platform.dylib) +
> 29 [0x7ff801c57e2d]
> + 2491 sigactionSegv (in libR.dylib) + 649
> [0x1042598c9] main.c:625
> + 2491 Rstd_ReadConsole (in libR.dylib) +
> 2042 [0x10435160a] sys-std.c:1044
> + 2491 R_SelectEx (in libR.dylib) + 308
> [0x104350854] sys-std.c:178
> + 2491 __select (in
> libsystem_kernel.dylib) + 10 [0x7ff801c0de4a]
> {code}
> And then a disassembly (where you can see a SHLX that shouldn't be there):
> {code}
> Dump of assembler code from 0x13ac6db00 to 0x13ac6db99ff:
> ...
> --Type <RET> for more, q to quit, c to continue without paging--
> 0x000000013ac6db82: mov $0x8,%ecx
> 0x000000013ac6db87: sub %rax,%rcx
> 0x000000013ac6db8a: lea 0xf1520b(%rip),%rdi # 0x13bb82d9c
> 0x000000013ac6db91: movzbl (%rcx,%rdi,1),%edi
> 0x000000013ac6db95: mov %esi,%ebx
> 0x000000013ac6db97: and %edi,%ebx
> => 0x000000013ac6db99: shlx %rax,%rbx,%rax
> 0x000000013ac6db9e: or 0x18(%r15),%al
> 0x000000013ac6dba2: mov %al,0x18(%r15)
> 0x000000013ac6dba6: cmp %rdx,%rcx
> 0x000000013ac6dba9: jg 0x13ac6dbf5
> 0x000000013ac6dbab: mov %al,(%r14)
> 0x000000013ac6dbae: inc %r14
> 0x000000013ac6dbb1: shrx %rcx,%rsi,%rax
> 0x000000013ac6dbb6: mov %rax,-0x20(%rbp)
> 0x000000013ac6dbba: sub %rcx,%rdx
> 0x000000013ac6dbbd: mov %rdx,%rbx
> 0x000000013ac6dbc0: sar $0x3,%rbx
> 0x000000013ac6dbc4: and $0x7,%edx
> 0x000000013ac6dbc7: cmp $0x1,%rdx
> 0x000000013ac6dbcb: sbb $0xffffffffffffffff,%rbx
> 0x000000013ac6dbcf: lea -0x20(%rbp),%rsi
> 0x000000013ac6dbd3: mov %r14,%rdi
> ...
> {code}
> We discovered this because homebrew alters the default build flags and uses
> {{-Os}}, though we should include a test that tests this in our CI as well
> (at least as a nightly) to catch it earlier:
> https://github.com/Homebrew/homebrew-core/issues/94724
--
This message was sent by Atlassian Jira
(v8.20.7#820007)