[
https://issues.apache.org/jira/browse/IMPALA-13107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17850443#comment-17850443
]
Michael Smith commented on IMPALA-13107:
----------------------------------------
This seems like a bug in either kRPC, or our handling of kRPC errors. We should
try to follow up on that (even if we mitigate this so it doesn't crash Impala).
I'm concerned that other cases could cause incomplete messages that we don't
handle correctly.
> Invalid TExecPlanFragmentInfo received by executor with instance number as 0
> ----------------------------------------------------------------------------
>
> Key: IMPALA-13107
> URL: https://issues.apache.org/jira/browse/IMPALA-13107
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Reporter: Wenzhe Zhou
> Assignee: Wenzhe Zhou
> Priority: Major
>
> In a customer reported case, TExecPlanFragmentInfo received by executors with
> instance number equals 0, which caused impala daemon to crash. Here are log
> messages collected on the Impala executors:
> {code:java}
> impalad.executor.net.impala.log.INFO.20240522-160138.197583:I0523
> 00:59:16.892853 199528 control-service.cc:148]
> 624c47e9264ebb62:5aa89af300000000] ExecQueryFInstances():
> query_id=624c47e9264ebb62:5aa89af300000000 coord=coordinator.net:27000
> #instances=0
> ......
> I0523 00:59:19.306522 199185 kMinidump in thread
> [1890723]query-state-624c47e9264ebb62:5aa89af300000000 running query
> 624c47e9264ebb62:5aa89af300000000, fragment instance
> 0000000000000000:0000000000000000
> Wrote minidump to
> /var/log/impala-minidumps/impalad/021b06ea-1627-4c69-9f27858a-f3cd9026.dmp
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> # SIGSEGV (0xb) at pc=0x00000000012ff9d9, pid=197583, tid=0x00007eefc98a0700
> #
> # JRE version: Java(TM) SE Runtime Environment (8.0_381) (build 1.8.0_381-b09)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.381-b09 mixed mode
> linux-amd64 )
> # Problematic frame:
> # C [impalad+0xeff9d9]
> impala::FragmentState::FragmentState(impala::QueryState*,
> impala::TPlanFragment const&, impala::PlanFragmentCtxPB const&)+0xf9
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core
> dumping, try "ulimit -c unlimited" before starting Java again
> #
> {code}
> From the collected profiles, there was no fragment with instance number as 0
> in the corresponding query plan so coordinator should not send fragments to
> executor with number of instances as 0. Executor log files showed that there
> were lots of KRPC errors around the time when receiving invalid
> TExecPlanFragmentInfo. It seems KRPC messages were truncated due to KRPC
> failures, but truncation might not cause thrift deserialization error. The
> invalid TExecPlanFragmentInfo caused Impala daemon to crash with following
> stack trace when the query was started on executor.
> {code:java}
> #0 SubstituteArg (value=..., this=0x7f86cec79d30) at
> ../gutil/strings/substitute.h:79
> #1 impala::FragmentState::FragmentState (this=0x35c78f40,
> query_state=0x7972db00, fragment=...,
> fragment_ctx=<error reading variable: Cannot access memory at address
> 0x35c78f88>) at fragment-state.cc:143
> #2 0x00000000013019aa in impala::FragmentState::CreateFragmentStateMap
> (fragment_info=..., exec_request=...,
> state=state@entry=0x7972db00, fragment_map=...) at fragment-state.cc:47
> #3 0x0000000001292d71 in impala::QueryState::StartFInstances
> (this=this@entry=0x7972db00) at query-state.cc:820
> #4 0x0000000001284810 in impala::QueryExecMgr::ExecuteQueryHelper
> (this=0x11943b00, qs=0x7972db00)
> at query-exec-mgr.cc:162
> #5 0x0000000001752915 in operator() (this=0x7f86cec7ab40)
> at
> ../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770
> #6 impala::Thread::SuperviseThread(std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > const&,
> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>
> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*,
> impala::Promise<long, (impala::PromiseMode)0>*) (name=..., category=...,
> functor=...,
> parent_thread_info=<optimized out>, thread_started=0x7f87b7b9acb0) at
> thread.cc:360
> #7 0x0000000001753c9b in operator()<void (*)(const
> std::__cxx11::basic_string<char>&, const std::__cxx11::basic_string<char>&,
> boost::function<void()>, const impala::ThreadDebugInfo*, impala::Promise<long
> int>*), boost::_bi::list0> (
> a=<synthetic pointer>, f=@0x1f66f3b8: <error reading variable>,
> this=0x1f66f3c0)
> at
> ../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:531
> #8 operator() (this=0x1f66f3b8)
> at
> ../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222
> #9 boost::detail::thread_data<boost::_bi::bind_t<void, void
> (*)(std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> > const&, std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > const&, boost::function<void
> ()>, impala::ThreadDebugInfo const*, impala::Promise<long,
> (impala::PromiseMode)0>*),
> boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > >,
> boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >,
> boost::_bi::value<impala::ThreadDebugInfo*>,
> boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > >
> >::run() (this=0x1f66f200)
> at
> ../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/thread/detail/thread.hpp:116
> #10 0x0000000001fb4322 in thread_proxy ()
> #11 0x00007f98af288ea5 in start_thread () from /lib64/libpthread.so.0
> #12 0x00007f98ac2dfb0d in gnu_dev_makedev () from /lib64/libc.so.6
> #13 0x0000000000000000 in ?? ()
> {code}
> Note that this issue happened when extra loads were added to the Impala
> cluster. It caused large RPC failures.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]