Joe McDonnell created IMPALA-14973:
--------------------------------------

             Summary: Crash when opening a ScannerContext::Stream on an Iceberg 
table
                 Key: IMPALA-14973
                 URL: https://issues.apache.org/jira/browse/IMPALA-14973
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
    Affects Versions: Impala 5.0.0
            Reporter: Joe McDonnell


On a cluster, we observed a crash with this stack trace:
{noformat}
#0  0x0000000001c79638 in impala::ScannerContext::Stream::Stream 
(this=0x180d29b80, parent=0x18f77140, scan_range=0x1a0b81d40, 
reservation=8388608, file_desc=0x0) at scanner-context.cc:86
#1  0x0000000001c7b290 in impala::ScannerContext::AddStream 
(this=this@entry=0x18f77140, range=0x1a0b81d40, reservation=8388608) at 
scanner-context.cc:91
#2  0x0000000001c2a0a0 in impala::HdfsScanNodeMt::GetNext (this=0x2e172000, 
state=<optimized out>, row_batch=0x2a76cdc0, eos=0x39992b01) at 
../../../toolchain/toolchain-packages-gcc10.4.0/boost-1.74.0-p1/include/boost/smart_ptr/scoped_ptr.hpp:103
#3  0x0000000001d02908 in impala::StreamingAggregationNode::GetRowsStreaming 
(this=this@entry=0x39992900, state=state@entry=0x7a3c8000, 
out_batch=out_batch@entry=0x7925fe00)
    at 
/grid/0/jenkins/workspace/workspace/CDWH-parallel-redhat8/SOURCES/impala_arm/toolchain/toolchain-packages-gcc10.4.0/gcc-10.4.0/include/c++/10.4.0/bits/unique_ptr.h:173
#4  0x0000000001d034ac in impala::StreamingAggregationNode::GetNext 
(this=0x39992900, state=0x7a3c8000, row_batch=0x7925fe00, eos=0xfffe427bff77) 
at streaming-aggregation-node.cc:77
#5  0x00000000014573f0 in impala::FragmentInstanceState::ExecInternal 
(this=this@entry=0x1a00afd40) at 
../../../toolchain/toolchain-packages-gcc10.4.0/boost-1.74.0-p1/include/boost/smart_ptr/scoped_ptr.hpp:109
#6  0x0000000001458ce0 in impala::FragmentInstanceState::Exec 
(this=this@entry=0x1a00afd40) at fragment-instance-state.cc:104
#7  0x00000000013ed280 in impala::QueryState::ExecFInstance (this=0x1c0ae000, 
fis=0x1a00afd40) at query-state.cc:1013
#8  0x0000000001b13998 in boost::function0<void>::operator() (this=0xb9ce0890) 
at 
../../../toolchain/toolchain-packages-gcc10.4.0/boost-1.74.0-p1/include/boost/function/function_template.hpp:763
#9  impala::Thread::SuperviseThread(std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > const&, 
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > 
const&, boost::function<void ()> const&, impala::ThreadDebugInfo const*, 
impala::Promise<long, (impala::PromiseMode)0>*) (name=..., category=..., 
functor=..., parent_thread_info=<optimized out>, thread_started=0xfffe3f5b9a30) 
at thread.cc:360
#10 0x00000000023bbcb8 in boost::(anonymous namespace)::thread_proxy 
(param=0xb9ce0700) at libs/thread/src/pthread/thread.cpp:179
#11 0x0000ffffb97968b8 in start_thread () from /lib64/libpthread.so.0
#12 0x0000ffffb78c1afc in removexattr () from /lib64/libc.so.6{noformat}
It is suspicious that file_desc=0x0. This would indicate that 
ScanRangeSharedState::GetFileDesc() would return null. It looks like that could 
happen if we called it with a partition_id or filename that are not part of the 
file_descs_. On a debug build, this would DCHECK, but on a release build this 
would return null.

This hasn't reproduced so far. We need to try to reproduce this and find the 
issue. At the very least, this needs better diagnostics to have more 
information if it happens again.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to