[ 
https://issues.apache.org/jira/browse/IMPALA-12123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17721903#comment-17721903
 ] 

ASF subversion and git services commented on IMPALA-12123:
----------------------------------------------------------

Commit 017175558341204bc32dd7998b245a12995234d7 in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=017175558 ]

IMPALA-12123: Fix crash triggered by incomplete HDFS cache reads

When using HDFS caching, the HDFS cache may not have the full
buffer in memory, and it can return a buffer that is incomplete.
In this case, the code falls back to the ordinary read path.
However, the ScanRange cache_ structure is still set up, and
the code in ScanRange::ReadSubRanges() tries to use it. This
can crash, because the buffer is too short (and may have been
freed).

This changes the code to null out the cache_ data structure
when there is an incomplete read from the HDFS cache.

Testing:
 - Reproduced the crash stack manually by putting a Parquet
   file with a page index in HDFS cache and manually forcing
   it down the incomplete read codepath.
 - Modified the disk-io-mgr-test and CacheReaderTestStub to
   simulate the incomplete read case. The test will hit a
   DCHECK or crash without this fixup.

Change-Id: I51d8be6c03716badee81675447ed94ae6249b21b
Reviewed-on: http://gerrit.cloudera.org:8080/19869
Reviewed-by: Michael Smith <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Zoltan Borok-Nagy <[email protected]>


> SIGSEGV in ScanRange::ReadSubRanges() when using HDFS caching
> -------------------------------------------------------------
>
>                 Key: IMPALA-12123
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12123
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 4.3.0
>            Reporter: Joe McDonnell
>            Assignee: Joe McDonnell
>            Priority: Critical
>             Fix For: Impala 4.3.0
>
>
> We have seen a crash where multiple executors hit this SIGSEGV simultaneously:
>  
> {noformat}
> #0  0x00007f42a5112cb5 in ?? ()
> #1  0x0000000001742dab in impala::io::ScanRange::ReadSubRanges 
> (this=this@entry=0x9d1bc940, queue=queue@entry=0x11c700a0, 
> buffer_desc=buffer_desc@entry=0x6f73f2c0, eof=eof@entry=0x7f39bbddb727) at 
> scan-range.cc:275
> #2  0x000000000174550b in impala::io::ScanRange::DoRead 
> (this=this@entry=0x9d1bc940, queue=queue@entry=0x11c700a0, disk_id=7) at 
> scan-range.cc:219
> #3  0x000000000173a0d6 in impala::io::DiskQueue::DiskThreadLoop 
> (this=0x11c700a0, io_mgr=0x1273e8c0) at disk-io-mgr.cc:504
> #4  0x00000000014b0355 in boost::function0<void>::operator() 
> (this=0x7f39bbddbb40) at 
> ../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770
> #5  impala::Thread::SuperviseThread(std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&, 
> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> 
> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, 
> impala::Promise<long, (impala::PromiseMode)0>*) (name=..., category=..., 
> functor=..., parent_thread_info=<optimized out>, 
> thread_started=0x7ffea5eca350) at thread.cc:360
> #6  0x00000000014b171b in 
> boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > >, 
> boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >, 
> boost::_bi::value<impala::ThreadDebugInfo*>, 
> boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> 
> >::operator()<void (*)(std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&, 
> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> 
> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, 
> impala::Promise<long, (impala::PromiseMode)0>*), 
> boost::_bi::list0>(boost::_bi::type<void>, void 
> (*&)(std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > const&, std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&, boost::function<void 
> ()>, impala::ThreadDebugInfo const*, impala::Promise<long, 
> (impala::PromiseMode)0>*), boost::_bi::list0&, int) (
>     a=<synthetic pointer>..., f=<error reading variable>, this=0x13b30800) at 
> ../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:531
> #7  boost::_bi::bind_t<void, void (*)(std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&, 
> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> 
> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, 
> impala::Promise<long, (impala::PromiseMode)0>*), 
> boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > >, 
> boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >, 
> boost::_bi::value<impala::ThreadDebugInfo*>, 
> boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > 
> >::operator()() (this=0x13b307f8)
>     at 
> ../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222
> #8  boost::detail::thread_data<boost::_bi::bind_t<void, void 
> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > const&, std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&, boost::function<void 
> ()>, impala::ThreadDebugInfo const*, impala::Promise<long, 
> (impala::PromiseMode)0>*), 
> boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > >, 
> boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >, 
> boost::_bi::value<impala::ThreadDebugInfo*>, 
> boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > > 
> >::run() (this=0x13b30640)
>     at 
> ../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/thread/detail/thread.hpp:116
> #9  0x0000000001caf602 in thread_proxy ()
> #10 0x00007f42a8420ea5 in ?? ()
> #11 0x0000000000000000 in ?? (){noformat}
> The error reported is:
>  
>  
> {noformat}
> Crash reason:  SIGSEGV
> Crash address: 0x7f3903c7f438
> Process uptime: not available{noformat}
> We are working on finding details about the query that hit this.
>  
> This corresponds to this line of code:
>  
> {noformat}
> Status ScanRange::ReadSubRanges(
>     DiskQueue* queue, BufferDescriptor* buffer_desc, bool* eof, FileReader* 
> file_reader) {
>   buffer_desc->len_ = 0;
>   while (buffer_desc->len() < buffer_desc->buffer_len()
>       && sub_range_pos_.index < sub_ranges_.size()) {
>     SubRange& sub_range = sub_ranges_[sub_range_pos_.index];
>     int64_t offset = sub_range.offset + sub_range_pos_.bytes_read;
>     int64_t bytes_to_read = min(sub_range.length -sub_range_pos_.bytes_read,
>         buffer_desc->buffer_len() - buffer_desc->len());
>     if (cache_.data != nullptr) {
>       memcpy(buffer_desc->buffer_ + buffer_desc->len(),
>           cache_.data + offset, bytes_to_read); <<< HERE
>     } else {{noformat}
> This is reading from HDFS caching.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to