Hi, > Then I went back to the pre-built binaries for 3.0.0 and 4.0.0 from JFrog > and the issue reappeared. I can only infer that it has to do with the way > the pre-built binaries are generated...
The pre-built binaries are the official RPM packages, right? They are built with the default gcc-g++ package not g++ from devtoolset-3. This may be related. Could you try building your program with the default gcc-g++ package? Thanks, -- kou In <calq9kxaxnyayqohuj3n0cknrbp6wbtxvj2pog7hcb0icy2r...@mail.gmail.com> "Re: C++ Segmentation Fault RecordBatchReader::ReadNext in CentOS only" on Wed, 9 Jun 2021 21:39:04 -0700, Rares Vernica <rvern...@gmail.com> wrote: > I got the apache-arrow-4.0.1 source and compiled it with the Debug flag. No > segmentation fault occurred. I then removed the Debug flag and still no > segmentation fault. I then tried the 4.0.0 source. Still no issues. > Finally, I tried the 3.0.0 source and still no issues. > > Then I went back to the pre-built binaries for 3.0.0 and 4.0.0 from JFrog > and the issue reappeared. I can only infer that it has to do with the way > the pre-built binaries are generated... > > Here is how I compiled the Arrow sources on my CentOS 7. > > release$ cmake3 -DARROW_WITH_ZLIB=ON > -DCMAKE_C_COMPILER=/opt/rh/devtoolset-3/root/usr/bin/gcc > -DCMAKE_CXX_COMPILER=/opt/rh/devtoolset-3/root/usr/bin/g++ .. > > Thanks, > Rares > > On Tue, Jun 8, 2021 at 5:37 PM Sutou Kouhei <k...@clear-code.com> wrote: > >> Hi, >> >> Could you try building Apache Arrow C++ with >> -DCMAKE_BUILD_TYPE=Debug and get backtrace again? It will >> show the source location on segmentation fault. >> >> Thanks, >> -- >> kou >> >> In <calq9kxa8sh07shuckhka9fuzu2n87tbydlp--aahgcwkfwo...@mail.gmail.com> >> "C++ Segmentation Fault RecordBatchReader::ReadNext in CentOS only" on >> Tue, 8 Jun 2021 12:01:27 -0700, >> Rares Vernica <rvern...@gmail.com> wrote: >> >> > Hello, >> > >> > We recently migrated our C++ Arrow code from 0.16 to 3.0.0. The code >> works >> > fine on Ubuntu, but we get a segmentation fault in CentOS while reading >> > Arrow Record Batch files. We can successfully read the files from Python >> or >> > Ubuntu so the files and the writer are fine. >> > >> > We use Record Batch Stream Reader/Writer to read/write data to files. >> > Sometimes we use GZIP to compress the streams. The migration to 3.0.0 was >> > pretty straight forward with minimal changes to the code >> > >> https://github.com/Paradigm4/bridge/commit/03e896e84230ddb41bfef68cde5ed9b21192a0e9 >> > We have an extensive test suite and all is good on Ubuntu. On CentOS the >> > write works OK but we get a segmentation fault during reading from C++. >> We >> > can successfully read the files using PyArrow. Moreover, the files >> written >> > by CentOS can be successfully read from C++ in Ubuntu. >> > >> > Here is the backtrace I got form gdb when the segmentation fault >> occurred: >> > >> > Program received signal SIGSEGV, Segmentation fault. >> > [Switching to Thread 0x7f548c7fb700 (LWP 2649)] >> > 0x00007f545c003340 in ?? () >> > (gdb) bt >> > #0 0x00007f545c003340 in ?? () >> > #1 0x00007f54903377ce in arrow::ipc::ArrayLoader::GetBuffer(int, >> > std::shared_ptr<arrow::Buffer>*) () from /lib64/libarrow.so.300 >> > #2 0x00007f549034006c in arrow::Status >> > arrow::VisitTypeInline<arrow::ipc::ArrayLoader>(arrow::DataType const&, >> > arrow::ipc::ArrayLoader*) () from /lib64/libarrow.so.300 >> > #3 0x00007f5490340db4 in arrow::ipc::ArrayLoader::Load(arrow::Field >> > const*, arrow::ArrayData*) () from /lib64/libarrow.so.300 >> > #4 0x00007f5490318b5b in >> > >> arrow::ipc::LoadRecordBatchSubset(org::apache::arrow::flatbuf::RecordBatch >> > const*, std::shared_ptr<arrow::Schema> const&, std::vector<bool, >> > std::allocator<bool> > const*, arrow::ipc::DictionaryMemo const*, >> > arrow::ipc::IpcReadOptions const&, arrow::ipc::MetadataVersion, >> > arrow::Compression::type, arrow::io::RandomAccessFile*) () from >> > /lib64/libarrow.so.300 >> > #5 0x00007f549031952a in >> > arrow::ipc::LoadRecordBatch(org::apache::arrow::flatbuf::RecordBatch >> > const*, std::shared_ptr<arrow::Schema> const&, std::vector<bool, >> > std::allocator<bool> > const&, arrow::ipc::DictionaryMemo const*, >> > arrow::ipc::IpcReadOptions const&, arrow::ipc::MetadataVersion, >> > arrow::Compression::type, arrow::io::RandomAccessFile*) () from >> > /lib64/libarrow.so.300 >> > #6 0x00007f54903197ce in >> arrow::ipc::ReadRecordBatchInternal(arrow::Buffer >> > const&, std::shared_ptr<arrow::Schema> const&, std::vector<bool, >> > std::allocator<bool> > const&, arrow::ipc::DictionaryMemo const*, >> > arrow::ipc::IpcReadOptions const&, arrow::io::RandomAccessFile*) () from >> > /lib64/libarrow.so.300 >> > #7 0x00007f5490345d9c in >> > >> arrow::ipc::RecordBatchStreamReaderImpl::ReadNext(std::shared_ptr<arrow::RecordBatch>*) >> > () from /lib64/libarrow.so.300 >> > #8 0x00007f549109b479 in scidb::ArrowReader::readObject >> > (this=this@entry=0x7f548c7f7d80, >> > name="index/0", reuse=reuse@entry=true, arrowBatch=std::shared_ptr >> (empty) >> > 0x0) at XIndex.cpp:104 >> > #9 0x00007f549109cb0a in scidb::XIndex::load (this=this@entry >> =0x7f545c003ab0, >> > driver=std::shared_ptr (count 3, weak 0) 0x7f545c003e70, query=warning: >> > RTTI symbol not found for class >> 'std::_Sp_counted_ptr_inplace<scidb::Query, >> > std::allocator<scidb::Query>, (__gnu_cxx::_Lock_policy)2>' >> > warning: RTTI symbol not found for class >> > 'std::_Sp_counted_ptr_inplace<scidb::Query, std::allocator<scidb::Query>, >> > (__gnu_cxx::_Lock_policy)2>' >> > std::shared_ptr (count 7, weak 7) 0x7f546c005330) at XIndex.cpp:286 >> > >> > I also tried Arrow 4.0.0. The code compiled just fine and the behavior >> was >> > the same, with the same backtrace. >> > >> > The code where the segmentation fault occurs is trying to read a GZIP >> > compressed Record Batch Stream. The file is 144 bytes and has only one >> > column with three int64 values. >> > >> >> file 0 >> > 0: gzip compressed data, from Unix >> > >> >> stat 0 >> > File: ‘0’ >> > Size: 144 Blocks: 8 IO Block: 4096 regular file >> > Device: 10302h/66306d Inode: 33715444 Links: 1 >> > Access: (0644/-rw-r--r--) Uid: ( 1001/ scidb) Gid: ( 1001/ scidb) >> > Context: unconfined_u:object_r:user_tmp_t:s0 >> > Access: 2021-06-08 04:42:28.653548604 +0000 >> > Modify: 2021-06-08 04:14:14.638927052 +0000 >> > Change: 2021-06-08 04:40:50.221279208 +0000 >> > Birth: - >> > >> > In [29]: s = pyarrow.input_stream('/tmp/bridge/foo/index/0', >> > compression='gzip') >> > In [30]: b = pyarrow.RecordBatchStreamReader(s) >> > In [31]: t = b.read_all() >> > In [32]: t.columns >> > Out[32]: >> > [<pyarrow.lib.ChunkedArray object at 0x7fefb5a552b0> >> > [ >> > [ >> > 0, >> > 5, >> > 10 >> > ] >> > ]] >> > >> > I removed the GZIP compression in both the writer and the reader but the >> > issue persists. So I don't think it is because of the compression. >> > >> > Here is the ldd on the library file which contains the reader and writers >> > that use the Arrow library. It is built on a CentOS 7 with the g++ 4.9.2 >> > compiler. >> > >> >> ldd libbridge.so >> > linux-vdso.so.1 => (0x00007fffe4f10000) >> > libarrow.so.300 => /lib64/libarrow.so.300 (0x00007f8a38908000) >> > libaws-cpp-sdk-s3.so => /opt/aws/lib64/libaws-cpp-sdk-s3.so >> > (0x00007f8a384b3000) >> > libm.so.6 => /lib64/libm.so.6 (0x00007f8a381b1000) >> > librt.so.1 => /lib64/librt.so.1 (0x00007f8a37fa9000) >> > libdl.so.2 => /lib64/libdl.so.2 (0x00007f8a37da5000) >> > libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f8a37a9e000) >> > libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f8a37888000) >> > libc.so.6 => /lib64/libc.so.6 (0x00007f8a374ba000) >> > libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007f8a37057000) >> > libssl.so.10 => /lib64/libssl.so.10 (0x00007f8a36de5000) >> > libbrotlienc.so.1 => /lib64/libbrotlienc.so.1 (0x00007f8a36b58000) >> > libbrotlidec.so.1 => /lib64/libbrotlidec.so.1 (0x00007f8a3694b000) >> > libbrotlicommon.so.1 => /lib64/libbrotlicommon.so.1 (0x00007f8a3672b000) >> > libutf8proc.so.1 => /lib64/libutf8proc.so.1 (0x00007f8a3647b000) >> > libbz2.so.1 => /lib64/libbz2.so.1 (0x00007f8a3626b000) >> > liblz4.so.1 => /lib64/liblz4.so.1 (0x00007f8a3605c000) >> > libsnappy.so.1 => /lib64/libsnappy.so.1 (0x00007f8a35e56000) >> > libz.so.1 => /lib64/libz.so.1 (0x00007f8a35c40000) >> > libzstd.so.1 => /lib64/libzstd.so.1 (0x00007f8a3593a000) >> > libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f8a3571e000) >> > /lib64/ld-linux-x86-64.so.2 (0x00007f8a39b67000) >> > libaws-cpp-sdk-core.so => /opt/aws/lib64/libaws-cpp-sdk-core.so >> > (0x00007f8a35413000) >> > libaws-c-event-stream.so.0unstable => >> > /opt/aws/lib64/libaws-c-event-stream.so.0unstable (0x00007f8a3520b000) >> > libaws-c-common.so.0unstable => >> /opt/aws/lib64/libaws-c-common.so.0unstable >> > (0x00007f8a34fd9000) >> > libaws-checksums.so => /opt/aws/lib64/libaws-checksums.so >> > (0x00007f8a34dce000) >> > libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x00007f8a34b81000) >> > libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007f8a34898000) >> > libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f8a34694000) >> > libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007f8a34461000) >> > libcurl.so.4 => /opt/curl/lib/libcurl.so.4 (0x00007f8a341ea000) >> > libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x00007f8a33fda000) >> > libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007f8a33dd6000) >> > libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f8a33bbc000) >> > libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f8a33995000) >> > libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f8a33733000) >> > >> >> /opt/rh/devtoolset-3/root/usr/bin/g++ --version >> > g++ (GCC) 4.9.2 20150212 (Red Hat 4.9.2-6) >> > >> > Do all of these ring any bells? >> > >> > Thank you! >> > Rares >>