jorisvandenbossche commented on PR #37854: URL: https://github.com/apache/arrow/pull/37854#issuecomment-1750503852
So this PR introduced a failure in the "AMD64 Ubuntu 22.04 C++ ASAN UBSAN" build (https://github.com/apache/arrow/actions/runs/6430392691/job/17462667620?pr=38069#logs), related to the LazyCache coalesced reads. See details below. I assume this is an existing bug, given this PR only changed a default for an option a user could already set before as well. But changing the default of course makes it more visible. Potentially short term option is to only change `pre_buffer` and keep the current non-lazy default `cache_options` (if that fixes it). Or revert the PR entirely until this is resolved (I don't have time today to look into more detail). <details> ``` 2023-10-06T10:40:14.0622194Z Running: /arrow/testing/data/parquet/fuzzing/clusterfuzz-testcase-minimized-parquet-arrow-fuzz-5640198106120192 2023-10-06T10:40:14.0651320Z /arrow/cpp/src/arrow/io/interfaces.cc:457: Check failed: (left.offset + left.length) <= (right.offset) Some read ranges overlap 2023-10-06T10:40:14.0661169Z /build/cpp/debug/parquet-arrow-fuzz(backtrace+0x5b)[0x55893309d6bb] 2023-10-06T10:40:14.0678721Z /usr/local/lib/libarrow.so.1400(_ZN5arrow4util7CerrLog14PrintBackTraceEv+0x1a5)[0x7fd67d9f5405] 2023-10-06T10:40:14.0694280Z /usr/local/lib/libarrow.so.1400(_ZN5arrow4util7CerrLogD2Ev+0x1f7)[0x7fd67d9f5177] 2023-10-06T10:40:14.0708313Z /usr/local/lib/libarrow.so.1400(_ZN5arrow4util7CerrLogD0Ev+0x61)[0x7fd67d9f5251] 2023-10-06T10:40:14.0722939Z /usr/local/lib/libarrow.so.1400(_ZN5arrow4util8ArrowLogD1Ev+0x1d0)[0x7fd67d9f4d80] 2023-10-06T10:40:14.0733586Z /usr/local/lib/libarrow.so.1400(+0xb13f151)[0x7fd67d3cc151] 2023-10-06T10:40:14.0746700Z /usr/local/lib/libarrow.so.1400(_ZN5arrow2io8internal18CoalesceReadRangesESt6vectorINS0_9ReadRangeESaIS3_EEll+0x4c1)[0x7fd67d3cac81] 2023-10-06T10:40:14.0762388Z /usr/local/lib/libarrow.so.1400(_ZN5arrow2io8internal14ReadRangeCache4Impl5CacheESt6vectorINS0_9ReadRangeESaIS5_EE+0x456)[0x7fd67d2c3be6] 2023-10-06T10:40:14.0775666Z /usr/local/lib/libarrow.so.1400(_ZN5arrow2io8internal14ReadRangeCache8LazyImpl5CacheESt6vectorINS0_9ReadRangeESaIS5_EE+0x24a)[0x7fd67d2c1cca] 2023-10-06T10:40:14.0790164Z /usr/local/lib/libarrow.so.1400(_ZN5arrow2io8internal14ReadRangeCache5CacheESt6vectorINS0_9ReadRangeESaIS4_EE+0x2a2)[0x7fd67d2bfec2] 2023-10-06T10:40:14.0795950Z /usr/local/lib/libparquet.so.1400(_ZN7parquet14SerializedFile9PreBufferERKSt6vectorIiSaIiEES5_RKN5arrow2io9IOContextERKNS7_12CacheOptionsE+0x1696)[0x7fd69120ef96] 2023-10-06T10:40:14.0801581Z /usr/local/lib/libparquet.so.1400(_ZN7parquet17ParquetFileReader9PreBufferERKSt6vectorIiSaIiEES5_RKN5arrow2io9IOContextERKNS7_12CacheOptionsE+0x360)[0x7fd69120d7c0] 2023-10-06T10:40:14.0808329Z /usr/local/lib/libparquet.so.1400(+0x15435e5)[0x7fd6904885e5] 2023-10-06T10:40:14.0808759Z /usr/local/lib/libparquet.so.1400(+0x1542728)[0x7fd690487728] 2023-10-06T10:40:14.0815343Z /usr/local/lib/libparquet.so.1400(+0x1542c7c)[0x7fd690487c7c] 2023-10-06T10:40:14.0816050Z /usr/local/lib/libparquet.so.1400(_ZN7parquet5arrow8internal10FuzzReaderESt10unique_ptrINS0_10FileReaderESt14default_deleteIS3_EE+0x3e2)[0x7fd69046cdf2] 2023-10-06T10:40:14.0822733Z ==14349== ERROR: libFuzzer: deadly signal 2023-10-06T10:40:14.0823311Z /usr/local/lib/libparquet.so.1400(_ZN7parquet5arrow8internal10FuzzReaderEPKhl+0x1130)[0x7fd69046e950] 2023-10-06T10:40:14.0824114Z /build/cpp/debug/parquet-arrow-fuzz(+0x118e98)[0x558933121e98] 2023-10-06T10:40:14.0825448Z /build/cpp/debug/parquet-arrow-fuzz(+0x3f354)[0x558933048354] 2023-10-06T10:40:14.0826059Z /build/cpp/debug/parquet-arrow-fuzz(+0x290d0)[0x5589330320d0] 2023-10-06T10:40:14.0826543Z /build/cpp/debug/parquet-arrow-fuzz(+0x2ee27)[0x558933037e27] 2023-10-06T10:40:14.0827941Z /build/cpp/debug/parquet-arrow-fuzz(+0x58c43)[0x558933061c43] 2023-10-06T10:40:14.0828405Z /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7fd6713bfd90] 2023-10-06T10:40:14.0828882Z /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7fd6713bfe40] 2023-10-06T10:40:14.0829351Z /build/cpp/debug/parquet-arrow-fuzz(+0x23995)[0x55893302c995] 2023-10-06T10:40:15.2094786Z #0 0x5589330eeab1 in __sanitizer_print_stack_trace (/build/cpp/debug/parquet-arrow-fuzz+0xe5ab1) (BuildId: 8286aad552d39ef7fd5d08d745adab7f6b613e22) 2023-10-06T10:40:15.2096115Z #1 0x558933061348 in fuzzer::PrintStackTrace() (/build/cpp/debug/parquet-arrow-fuzz+0x58348) (BuildId: 8286aad552d39ef7fd5d08d745adab7f6b613e22) 2023-10-06T10:40:15.2097546Z #2 0x558933046dc3 in fuzzer::Fuzzer::CrashCallback() (/build/cpp/debug/parquet-arrow-fuzz+0x3ddc3) (BuildId: 8286aad552d39ef7fd5d08d745adab7f6b613e22) 2023-10-06T10:40:15.2098544Z #3 0x7fd6713d851f (/lib/x86_64-linux-gnu/libc.so.6+0x4251f) (BuildId: 229b7dc509053fe4df5e29e8629911f0c3bc66dd) 2023-10-06T10:40:15.2099481Z #4 0x7fd67142ca7b in pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0x96a7b) (BuildId: 229b7dc509053fe4df5e29e8629911f0c3bc66dd) 2023-10-06T10:40:15.2101878Z #5 0x7fd6713d8475 in gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x42475) (BuildId: 229b7dc509053fe4df5e29e8629911f0c3bc66dd) 2023-10-06T10:40:15.2102783Z #6 0x7fd6713be7f2 in abort (/lib/x86_64-linux-gnu/libc.so.6+0x287f2) (BuildId: 229b7dc509053fe4df5e29e8629911f0c3bc66dd) 2023-10-06T10:40:15.2103486Z #7 0x7fd67d9f5193 in arrow::util::CerrLog::~CerrLog() /arrow/cpp/src/arrow/util/logging.cc:72:7 2023-10-06T10:40:15.2104144Z #8 0x7fd67d9f5250 in arrow::util::CerrLog::~CerrLog() /arrow/cpp/src/arrow/util/logging.cc:66:22 2023-10-06T10:40:15.2104793Z #9 0x7fd67d9f4d7f in arrow::util::ArrowLog::~ArrowLog() /arrow/cpp/src/arrow/util/logging.cc:250:5 2023-10-06T10:40:15.2105719Z #10 0x7fd67d3cc150 in arrow::io::internal::(anonymous namespace)::ReadRangeCombiner::Coalesce(std::vector<arrow::io::ReadRange, std::allocator<arrow::io::ReadRange> >) /arrow/cpp/src/arrow/io/interfaces.cc:457:7 2023-10-06T10:40:15.2106830Z #11 0x7fd67d3cac80 in arrow::io::internal::CoalesceReadRanges(std::vector<arrow::io::ReadRange, std::allocator<arrow::io::ReadRange> >, long, long) /arrow/cpp/src/arrow/io/interfaces.cc:518:19 2023-10-06T10:40:15.2107880Z #12 0x7fd67d2c3be5 in arrow::io::internal::ReadRangeCache::Impl::Cache(std::vector<arrow::io::ReadRange, std::allocator<arrow::io::ReadRange> >) /arrow/cpp/src/arrow/io/caching.cc:177:14 2023-10-06T10:40:15.2108897Z #13 0x7fd67d2c1cc9 in arrow::io::internal::ReadRangeCache::LazyImpl::Cache(std::vector<arrow::io::ReadRange, std::allocator<arrow::io::ReadRange> >) /arrow/cpp/src/arrow/io/caching.cc:288:34 2023-10-06T10:40:15.2109909Z #14 0x7fd67d2bfec1 in arrow::io::internal::ReadRangeCache::Cache(std::vector<arrow::io::ReadRange, std::allocator<arrow::io::ReadRange> >) /arrow/cpp/src/arrow/io/caching.cc:320:17 2023-10-06T10:40:15.2111039Z #15 0x7fd69120ef95 in parquet::SerializedFile::PreBuffer(std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, arrow::io::IOContext const&, arrow::io::CacheOptions const&) /arrow/cpp/src/parquet/file_reader.cc:368:5 2023-10-06T10:40:15.2112348Z #16 0x7fd69120d7bf in parquet::ParquetFileReader::PreBuffer(std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, arrow::io::IOContext const&, arrow::io::CacheOptions const&) /arrow/cpp/src/parquet/file_reader.cc:862:9 2023-10-06T10:40:15.2113660Z #17 0x7fd6904885e4 in parquet::arrow::(anonymous namespace)::FileReaderImpl::ReadRowGroups(std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, std::shared_ptr<arrow::Table>*) /arrow/cpp/src/parquet/arrow/reader.cc:1224:23 2023-10-06T10:40:15.2114817Z #18 0x7fd690487727 in parquet::arrow::(anonymous namespace)::FileReaderImpl::ReadRowGroup(int, std::vector<int, std::allocator<int> > const&, std::shared_ptr<arrow::Table>*) /arrow/cpp/src/parquet/arrow/reader.cc:321:12 2023-10-06T10:40:15.2115872Z #19 0x7fd690487c7b in parquet::arrow::(anonymous namespace)::FileReaderImpl::ReadRowGroup(int, std::shared_ptr<arrow::Table>*) /arrow/cpp/src/parquet/arrow/reader.cc:325:12 2023-10-06T10:40:15.2116737Z #20 0x7fd69046cdf1 in parquet::arrow::internal::FuzzReader(std::unique_ptr<parquet::arrow::FileReader, std::default_delete<parquet::arrow::FileReader> >) /arrow/cpp/src/parquet/arrow/reader.cc:1374:37 2023-10-06T10:40:15.2117736Z #21 0x7fd69046e94f in parquet::arrow::internal::FuzzReader(unsigned char const*, long) /arrow/cpp/src/parquet/arrow/reader.cc:1399:11 2023-10-06T10:40:15.2118358Z #22 0x558933121e97 in LLVMFuzzerTestOneInput /arrow/cpp/src/parquet/arrow/fuzz.cc:22:17 2023-10-06T10:40:15.2119357Z #23 0x558933048353 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) (/build/cpp/debug/parquet-arrow-fuzz+0x3f353) (BuildId: 8286aad552d39ef7fd5d08d745adab7f6b613e22) 2023-10-06T10:40:15.2120490Z #24 0x5589330320cf in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) (/build/cpp/debug/parquet-arrow-fuzz+0x290cf) (BuildId: 8286aad552d39ef7fd5d08d745adab7f6b613e22) 2023-10-06T10:40:15.2121762Z #25 0x558933037e26 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) (/build/cpp/debug/parquet-arrow-fuzz+0x2ee26) (BuildId: 8286aad552d39ef7fd5d08d745adab7f6b613e22) 2023-10-06T10:40:15.2122720Z #26 0x558933061c42 in main (/build/cpp/debug/parquet-arrow-fuzz+0x58c42) (BuildId: 8286aad552d39ef7fd5d08d745adab7f6b613e22) 2023-10-06T10:40:15.2123509Z #27 0x7fd6713bfd8f (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f) (BuildId: 229b7dc509053fe4df5e29e8629911f0c3bc66dd) 2023-10-06T10:40:15.2124294Z #28 0x7fd6713bfe3f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e3f) (BuildId: 229b7dc509053fe4df5e29e8629911f0c3bc66dd) 2023-10-06T10:40:15.2125121Z #29 0x55893302c994 in _start (/build/cpp/debug/parquet-arrow-fuzz+0x23994) (BuildId: 8286aad552d39ef7fd5d08d745adab7f6b613e22) 2023-10-06T10:40:15.2125489Z 2023-10-06T10:40:15.2126159Z NOTE: libFuzzer has rudimentary signal handlers. 2023-10-06T10:40:15.2127161Z Combine libFuzzer with AddressSanitizer or similar for better crash reports. 2023-10-06T10:40:15.2127655Z SUMMARY: libFuzzer: deadly signal 2023-10-06T10:40:16.9350640Z 77 2023-10-06T10:40:17.0185097Z Error: `docker-compose --file /home/runner/work/arrow/arrow/docker-compose.yml run --rm ubuntu-cpp-sanitizer` exited with a non-zero exit code 77, see the process log above. ``` </details> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
