Joe McDonnell created IMPALA-14973:
--------------------------------------
Summary: Crash when opening a ScannerContext::Stream on an Iceberg
table
Key: IMPALA-14973
URL: https://issues.apache.org/jira/browse/IMPALA-14973
Project: IMPALA
Issue Type: Bug
Components: Backend
Affects Versions: Impala 5.0.0
Reporter: Joe McDonnell
On a cluster, we observed a crash with this stack trace:
{noformat}
#0 0x0000000001c79638 in impala::ScannerContext::Stream::Stream
(this=0x180d29b80, parent=0x18f77140, scan_range=0x1a0b81d40,
reservation=8388608, file_desc=0x0) at scanner-context.cc:86
#1 0x0000000001c7b290 in impala::ScannerContext::AddStream
(this=this@entry=0x18f77140, range=0x1a0b81d40, reservation=8388608) at
scanner-context.cc:91
#2 0x0000000001c2a0a0 in impala::HdfsScanNodeMt::GetNext (this=0x2e172000,
state=<optimized out>, row_batch=0x2a76cdc0, eos=0x39992b01) at
../../../toolchain/toolchain-packages-gcc10.4.0/boost-1.74.0-p1/include/boost/smart_ptr/scoped_ptr.hpp:103
#3 0x0000000001d02908 in impala::StreamingAggregationNode::GetRowsStreaming
(this=this@entry=0x39992900, state=state@entry=0x7a3c8000,
out_batch=out_batch@entry=0x7925fe00)
at
/grid/0/jenkins/workspace/workspace/CDWH-parallel-redhat8/SOURCES/impala_arm/toolchain/toolchain-packages-gcc10.4.0/gcc-10.4.0/include/c++/10.4.0/bits/unique_ptr.h:173
#4 0x0000000001d034ac in impala::StreamingAggregationNode::GetNext
(this=0x39992900, state=0x7a3c8000, row_batch=0x7925fe00, eos=0xfffe427bff77)
at streaming-aggregation-node.cc:77
#5 0x00000000014573f0 in impala::FragmentInstanceState::ExecInternal
(this=this@entry=0x1a00afd40) at
../../../toolchain/toolchain-packages-gcc10.4.0/boost-1.74.0-p1/include/boost/smart_ptr/scoped_ptr.hpp:109
#6 0x0000000001458ce0 in impala::FragmentInstanceState::Exec
(this=this@entry=0x1a00afd40) at fragment-instance-state.cc:104
#7 0x00000000013ed280 in impala::QueryState::ExecFInstance (this=0x1c0ae000,
fis=0x1a00afd40) at query-state.cc:1013
#8 0x0000000001b13998 in boost::function0<void>::operator() (this=0xb9ce0890)
at
../../../toolchain/toolchain-packages-gcc10.4.0/boost-1.74.0-p1/include/boost/function/function_template.hpp:763
#9 impala::Thread::SuperviseThread(std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const&,
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >
const&, boost::function<void ()> const&, impala::ThreadDebugInfo const*,
impala::Promise<long, (impala::PromiseMode)0>*) (name=..., category=...,
functor=..., parent_thread_info=<optimized out>, thread_started=0xfffe3f5b9a30)
at thread.cc:360
#10 0x00000000023bbcb8 in boost::(anonymous namespace)::thread_proxy
(param=0xb9ce0700) at libs/thread/src/pthread/thread.cpp:179
#11 0x0000ffffb97968b8 in start_thread () from /lib64/libpthread.so.0
#12 0x0000ffffb78c1afc in removexattr () from /lib64/libc.so.6{noformat}
It is suspicious that file_desc=0x0. This would indicate that
ScanRangeSharedState::GetFileDesc() would return null. It looks like that could
happen if we called it with a partition_id or filename that are not part of the
file_descs_. On a debug build, this would DCHECK, but on a release build this
would return null.
This hasn't reproduced so far. We need to try to reproduce this and find the
issue. At the very least, this needs better diagnostics to have more
information if it happens again.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)