[
https://issues.apache.org/jira/browse/IMPALA-13107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wenzhe Zhou updated IMPALA-13107:
---------------------------------
Description:
In a customer reported case, TExecPlanFragmentInfo received by executors with
instance number equals 0, which caused impala daemon to crash. Here are log
messages collected on the Impala executors:
{code:java}
impalad.executor.net.impala.log.INFO.20240522-160138.197583:I0523
00:59:16.892853 199528 control-service.cc:148]
624c47e9264ebb62:5aa89af300000000] ExecQueryFInstances():
query_id=624c47e9264ebb62:5aa89af300000000 coord=coordinator.net:27000
#instances=0
......
I0523 00:59:19.306522 199185 kMinidump in thread
[1890723]query-state-624c47e9264ebb62:5aa89af300000000 running query
624c47e9264ebb62:5aa89af300000000, fragment instance
0000000000000000:0000000000000000
Wrote minidump to
/var/log/impala-minidumps/impalad/021b06ea-1627-4c69-9f27858a-f3cd9026.dmp
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00000000012ff9d9, pid=197583, tid=0x00007eefc98a0700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_381) (build 1.8.0_381-b09)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.381-b09 mixed mode linux-amd64
)
# Problematic frame:
# C [impalad+0xeff9d9]
impala::FragmentState::FragmentState(impala::QueryState*, impala::TPlanFragment
const&, impala::PlanFragmentCtxPB const&)+0xf9
#
# Failed to write core dump. Core dumps have been disabled. To enable core
dumping, try "ulimit -c unlimited" before starting Java again
#
{code}
>From the collected profiles, there was no fragment with instance number as 0
>in the corresponding query plan so coordinator should not send fragments to
>executor with number of instances as 0. Executor log files showed that there
>were lots of KRPC errors around the time when receiving invalid
>TExecPlanFragmentInfo. It seems KRPC messages were truncated due to KRPC
>failures, but truncation might not cause thrift deserialization error. The
>invalid TExecPlanFragmentInfo caused Impala daemon to crash with following
>stack trace when the query was started on executor.
{code:java}
#0 SubstituteArg (value=..., this=0x7f86cec79d30) at
../gutil/strings/substitute.h:79
#1 impala::FragmentState::FragmentState (this=0x35c78f40,
query_state=0x7972db00, fragment=...,
fragment_ctx=<error reading variable: Cannot access memory at address
0x35c78f88>) at fragment-state.cc:143
#2 0x00000000013019aa in impala::FragmentState::CreateFragmentStateMap
(fragment_info=..., exec_request=...,
state=state@entry=0x7972db00, fragment_map=...) at fragment-state.cc:47
#3 0x0000000001292d71 in impala::QueryState::StartFInstances
(this=this@entry=0x7972db00) at query-state.cc:820
#4 0x0000000001284810 in impala::QueryExecMgr::ExecuteQueryHelper
(this=0x11943b00, qs=0x7972db00)
at query-exec-mgr.cc:162
#5 0x0000000001752915 in operator() (this=0x7f86cec7ab40)
at
../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770
#6 impala::Thread::SuperviseThread(std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const&,
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >
const&, boost::function<void ()>, impala::ThreadDebugInfo const*,
impala::Promise<long, (impala::PromiseMode)0>*) (name=..., category=...,
functor=...,
parent_thread_info=<optimized out>, thread_started=0x7f87b7b9acb0) at
thread.cc:360
#7 0x0000000001753c9b in operator()<void (*)(const
std::__cxx11::basic_string<char>&, const std::__cxx11::basic_string<char>&,
boost::function<void()>, const impala::ThreadDebugInfo*, impala::Promise<long
int>*), boost::_bi::list0> (
a=<synthetic pointer>, f=@0x1f66f3b8: <error reading variable>,
this=0x1f66f3c0)
at
../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:531
#8 operator() (this=0x1f66f3b8)
at
../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222
#9 boost::detail::thread_data<boost::_bi::bind_t<void, void
(*)(std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&, std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const&, boost::function<void
()>, impala::ThreadDebugInfo const*, impala::Promise<long,
(impala::PromiseMode)0>*),
boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > >,
boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >,
boost::_bi::value<impala::ThreadDebugInfo*>,
boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > > >::run()
(this=0x1f66f200)
at
../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/thread/detail/thread.hpp:116
#10 0x0000000001fb4322 in thread_proxy ()
#11 0x00007f98af288ea5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f98ac2dfb0d in gnu_dev_makedev () from /lib64/libc.so.6
#13 0x0000000000000000 in ?? ()
{code}
Note that this issue happened when extra loads were added to the Impala
cluster. It caused large RPC failures.
was:
In a customer report case, TExecPlanFragmentInfo received by executors with
instance number as 0, like
{code:java}
impalad.executor.net.impala.log.INFO.20240522-160138.197583:I0523
00:59:16.892853 199528 control-service.cc:148]
624c47e9264ebb62:5aa89af300000000] ExecQueryFInstances():
query_id=624c47e9264ebb62:5aa89af300000000 coord=coordinator.net:27000
#instances=0
......
I0523 00:59:19.306522 199185 kMinidump in thread
[1890723]query-state-624c47e9264ebb62:5aa89af300000000 running query
624c47e9264ebb62:5aa89af300000000, fragment instance
0000000000000000:0000000000000000
Wrote minidump to
/var/log/impala-minidumps/impalad/021b06ea-1627-4c69-9f27858a-f3cd9026.dmp
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00000000012ff9d9, pid=197583, tid=0x00007eefc98a0700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_381) (build 1.8.0_381-b09)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.381-b09 mixed mode linux-amd64
)
# Problematic frame:
# C [impalad+0xeff9d9]
impala::FragmentState::FragmentState(impala::QueryState*, impala::TPlanFragment
const&, impala::PlanFragmentCtxPB const&)+0xf9
#
# Failed to write core dump. Core dumps have been disabled. To enable core
dumping, try "ulimit -c unlimited" before starting Java again
#
{code}
>From the collected profile, there was no fragment with instance number as 0 in
>the query plan so coordinator should not send task to executor with number of
>instances as 0. Executor log files showed that there were lots of KRPC errors
>around the time when receiving invalid TExecPlanFragmentInfo. It seems KRPC
>messages were truncated due to KRPC failures, but truncation might not cause
>thrift deserialization error. The invalid TExecPlanFragmentInfo caused Impala
>daemon to crash with following stack trace when the query was started on
>executor.
{code:java}
#0 SubstituteArg (value=..., this=0x7f86cec79d30) at
../gutil/strings/substitute.h:79
#1 impala::FragmentState::FragmentState (this=0x35c78f40,
query_state=0x7972db00, fragment=...,
fragment_ctx=<error reading variable: Cannot access memory at address
0x35c78f88>) at fragment-state.cc:143
#2 0x00000000013019aa in impala::FragmentState::CreateFragmentStateMap
(fragment_info=..., exec_request=...,
state=state@entry=0x7972db00, fragment_map=...) at fragment-state.cc:47
#3 0x0000000001292d71 in impala::QueryState::StartFInstances
(this=this@entry=0x7972db00) at query-state.cc:820
#4 0x0000000001284810 in impala::QueryExecMgr::ExecuteQueryHelper
(this=0x11943b00, qs=0x7972db00)
at query-exec-mgr.cc:162
#5 0x0000000001752915 in operator() (this=0x7f86cec7ab40)
at
../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770
#6 impala::Thread::SuperviseThread(std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const&,
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >
const&, boost::function<void ()>, impala::ThreadDebugInfo const*,
impala::Promise<long, (impala::PromiseMode)0>*) (name=..., category=...,
functor=...,
parent_thread_info=<optimized out>, thread_started=0x7f87b7b9acb0) at
thread.cc:360
#7 0x0000000001753c9b in operator()<void (*)(const
std::__cxx11::basic_string<char>&, const std::__cxx11::basic_string<char>&,
boost::function<void()>, const impala::ThreadDebugInfo*, impala::Promise<long
int>*), boost::_bi::list0> (
a=<synthetic pointer>, f=@0x1f66f3b8: <error reading variable>,
this=0x1f66f3c0)
at
../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:531
#8 operator() (this=0x1f66f3b8)
at
../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222
#9 boost::detail::thread_data<boost::_bi::bind_t<void, void
(*)(std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&, std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const&, boost::function<void
()>, impala::ThreadDebugInfo const*, impala::Promise<long,
(impala::PromiseMode)0>*),
boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > >,
boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >,
boost::_bi::value<impala::ThreadDebugInfo*>,
boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > > >::run()
(this=0x1f66f200)
at
../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/thread/detail/thread.hpp:116
#10 0x0000000001fb4322 in thread_proxy ()
#11 0x00007f98af288ea5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f98ac2dfb0d in gnu_dev_makedev () from /lib64/libc.so.6
#13 0x0000000000000000 in ?? ()
{code}
> Invalid TExecPlanFragmentInfo received by executor with instance number as 0
> ----------------------------------------------------------------------------
>
> Key: IMPALA-13107
> URL: https://issues.apache.org/jira/browse/IMPALA-13107
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Reporter: Wenzhe Zhou
> Assignee: Wenzhe Zhou
> Priority: Major
>
> In a customer reported case, TExecPlanFragmentInfo received by executors with
> instance number equals 0, which caused impala daemon to crash. Here are log
> messages collected on the Impala executors:
> {code:java}
> impalad.executor.net.impala.log.INFO.20240522-160138.197583:I0523
> 00:59:16.892853 199528 control-service.cc:148]
> 624c47e9264ebb62:5aa89af300000000] ExecQueryFInstances():
> query_id=624c47e9264ebb62:5aa89af300000000 coord=coordinator.net:27000
> #instances=0
> ......
> I0523 00:59:19.306522 199185 kMinidump in thread
> [1890723]query-state-624c47e9264ebb62:5aa89af300000000 running query
> 624c47e9264ebb62:5aa89af300000000, fragment instance
> 0000000000000000:0000000000000000
> Wrote minidump to
> /var/log/impala-minidumps/impalad/021b06ea-1627-4c69-9f27858a-f3cd9026.dmp
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> # SIGSEGV (0xb) at pc=0x00000000012ff9d9, pid=197583, tid=0x00007eefc98a0700
> #
> # JRE version: Java(TM) SE Runtime Environment (8.0_381) (build 1.8.0_381-b09)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.381-b09 mixed mode
> linux-amd64 )
> # Problematic frame:
> # C [impalad+0xeff9d9]
> impala::FragmentState::FragmentState(impala::QueryState*,
> impala::TPlanFragment const&, impala::PlanFragmentCtxPB const&)+0xf9
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core
> dumping, try "ulimit -c unlimited" before starting Java again
> #
> {code}
> From the collected profiles, there was no fragment with instance number as 0
> in the corresponding query plan so coordinator should not send fragments to
> executor with number of instances as 0. Executor log files showed that there
> were lots of KRPC errors around the time when receiving invalid
> TExecPlanFragmentInfo. It seems KRPC messages were truncated due to KRPC
> failures, but truncation might not cause thrift deserialization error. The
> invalid TExecPlanFragmentInfo caused Impala daemon to crash with following
> stack trace when the query was started on executor.
> {code:java}
> #0 SubstituteArg (value=..., this=0x7f86cec79d30) at
> ../gutil/strings/substitute.h:79
> #1 impala::FragmentState::FragmentState (this=0x35c78f40,
> query_state=0x7972db00, fragment=...,
> fragment_ctx=<error reading variable: Cannot access memory at address
> 0x35c78f88>) at fragment-state.cc:143
> #2 0x00000000013019aa in impala::FragmentState::CreateFragmentStateMap
> (fragment_info=..., exec_request=...,
> state=state@entry=0x7972db00, fragment_map=...) at fragment-state.cc:47
> #3 0x0000000001292d71 in impala::QueryState::StartFInstances
> (this=this@entry=0x7972db00) at query-state.cc:820
> #4 0x0000000001284810 in impala::QueryExecMgr::ExecuteQueryHelper
> (this=0x11943b00, qs=0x7972db00)
> at query-exec-mgr.cc:162
> #5 0x0000000001752915 in operator() (this=0x7f86cec7ab40)
> at
> ../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770
> #6 impala::Thread::SuperviseThread(std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > const&,
> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>
> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*,
> impala::Promise<long, (impala::PromiseMode)0>*) (name=..., category=...,
> functor=...,
> parent_thread_info=<optimized out>, thread_started=0x7f87b7b9acb0) at
> thread.cc:360
> #7 0x0000000001753c9b in operator()<void (*)(const
> std::__cxx11::basic_string<char>&, const std::__cxx11::basic_string<char>&,
> boost::function<void()>, const impala::ThreadDebugInfo*, impala::Promise<long
> int>*), boost::_bi::list0> (
> a=<synthetic pointer>, f=@0x1f66f3b8: <error reading variable>,
> this=0x1f66f3c0)
> at
> ../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:531
> #8 operator() (this=0x1f66f3b8)
> at
> ../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222
> #9 boost::detail::thread_data<boost::_bi::bind_t<void, void
> (*)(std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> > const&, std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > const&, boost::function<void
> ()>, impala::ThreadDebugInfo const*, impala::Promise<long,
> (impala::PromiseMode)0>*),
> boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > >,
> boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >,
> boost::_bi::value<impala::ThreadDebugInfo*>,
> boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > >
> >::run() (this=0x1f66f200)
> at
> ../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/thread/detail/thread.hpp:116
> #10 0x0000000001fb4322 in thread_proxy ()
> #11 0x00007f98af288ea5 in start_thread () from /lib64/libpthread.so.0
> #12 0x00007f98ac2dfb0d in gnu_dev_makedev () from /lib64/libc.so.6
> #13 0x0000000000000000 in ?? ()
> {code}
> Note that this issue happened when extra loads were added to the Impala
> cluster. It caused large RPC failures.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]