[
https://issues.apache.org/jira/browse/KUDU-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702895#comment-17702895
]
Joe McDonnell commented on KUDU-3461:
-------------------------------------
>From what I can tell, this seems to be what happens:
MetaCacheServerPicker::PickLeader() goes through this chunk of logic without
finding a leader
([https://github.com/apache/kudu/blob/master/src/kudu/client/meta_cache.cc#L461-L506])
Then, it calls into MetaCache::LookupTabletByKey() with LookUpTabletCb() as the
callback function
([https://github.com/apache/kudu/blob/master/src/kudu/client/meta_cache.cc#L521-L529])
MetaCache::LookupTabletByKey() finds it in the hot path and calls the callback
function LookUpTabletCb().
[https://github.com/apache/kudu/blob/master/src/kudu/client/meta_cache.cc#L1405-L1410]
MetaCacheServerPicker::LookUpTabletCb() calls into PickLeader():
[https://github.com/apache/kudu/blob/master/src/kudu/client/meta_cache.cc#L585]
Repeat until the stack is gone.
> Kudu client can blow the stack with infinite recursions between PickLeader()
> and LookupTabletByKey()
> ----------------------------------------------------------------------------------------------------
>
> Key: KUDU-3461
> URL: https://issues.apache.org/jira/browse/KUDU-3461
> Project: Kudu
> Issue Type: Bug
> Components: client
> Affects Versions: 1.17.0
> Reporter: Joe McDonnell
> Priority: Blocker
>
> In an Impala cluster, we ran into a scenario that causes Impala to crash with
> a SIGSEGV. When reproducing while running in gdb, we see the stack get blown
> out with this recursion:
> {noformat}
> #0 0x00007f983e031a1c in clock_gettime ()
> #1 0x00007f983bfda0b5 in __GI___clock_gettime (clock_id=clock_id@entry=1,
> tp=0x7f967bd8b070) at ../sysdeps/unix/sysv/linux/clock_gettime.c:38
> #2 0x00007f983c9f8e48 in kudu::Stopwatch::GetTimes (times=0x7f967bd8b1b0,
> this=<optimized out>, this=<optimized out>) at
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/util/stopwatch.h:294
> #3 0x00007f983ca09829 in kudu::Stopwatch::stop (this=0x7f967bd8b320) at
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/util/stopwatch.h:218
> #4 kudu::Stopwatch::stop (this=0x7f967bd8b320) at
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/util/stopwatch.h:213
> #5 kudu::sw_internal::LogTiming::Print (max_expected_millis=50,
> this=0x7f967bd8b320) at
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/util/stopwatch.h:359
> #6 kudu::sw_internal::LogTiming::~LogTiming (this=0x7f967bd8b320,
> __in_chrg=<optimized out>) at
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/util/stopwatch.h:329
> #7 0x00007f983c9fe32c in
> kudu::client::internal::MetaCache::LookupEntryByKeyFastPath (this=<optimized
> out>, table=<optimized out>, partition_key=..., entry=0x7f967bd8b4c0) at
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/util/locks.h:99
> #8 0x00007f983c9fe656 in kudu::client::internal::MetaCache::DoFastPathLookup
> (this=0xde431e0, table=0xf899300, partition_key=0x7f967bd8b700,
> lookup_type=kudu::client::internal::MetaCache::LookupType::kPoint,
> remote_tablet=0x0)
> at /mnt/source/kudu/kudu-345fd44ca3/src/kudu/client/meta_cache.cc:1243
> #9 0x00007f983ca05731 in
> kudu::client::internal::MetaCache::LookupTabletByKey(kudu::client::KuduTable
> const*, kudu::PartitionKey, kudu::MonoTime const&,
> kudu::client::internal::MetaCache::LookupType,
> scoped_refptr<kudu::client::internal::RemoteTablet>*, std::function<void
> (kudu::Status const&)> const&) (this=0xde431e0, table=0xf899300,
> partition_key=..., deadline=...,
> lookup_type=kudu::client::internal::MetaCache::LookupType::kPoint,
> remote_tablet=0x0, callback=...)
> at /mnt/source/kudu/kudu-345fd44ca3/src/kudu/client/meta_cache.cc:1405
> #10 0x00007f983ca0598c in
> kudu::client::internal::MetaCacheServerPicker::PickLeader(std::function<void
> (kudu::Status const&, kudu::client::internal::RemoteTabletServer*)> const&,
> kudu::MonoTime const&) (this=0xdec0000, callback=..., deadline=...)
> at /mnt/source/kudu/kudu-345fd44ca3/src/kudu/common/partition.h:153
> #11 0x00007f983ca0575f in std::function<void (kudu::Status
> const&)>::operator()(kudu::Status const&) const (__args#0=...,
> this=0x7f967bd8b8c0) at
> /mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/std_function.h:617
> #12
> kudu::client::internal::MetaCache::LookupTabletByKey(kudu::client::KuduTable
> const*, kudu::PartitionKey, kudu::MonoTime const&,
> kudu::client::internal::MetaCache::LookupType,
> scoped_refptr<kudu::client::internal::RemoteTablet>*, std::function<void
> (kudu::Status const&)> const&) (this=0xde431e0, table=0xf899300,
> partition_key=..., deadline=...,
> lookup_type=kudu::client::internal::MetaCache::LookupType::kPoint,
> remote_tablet=0x0, callback=...) at
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/client/meta_cache.cc:1408
> #13 0x00007f983ca0598c in
> kudu::client::internal::MetaCacheServerPicker::PickLeader(std::function<void
> (kudu::Status const&, kudu::client::internal::RemoteTabletServer*)> const&,
> kudu::MonoTime const&) (this=0xdec0000, callback=..., deadline=...)
> at /mnt/source/kudu/kudu-345fd44ca3/src/kudu/common/partition.h:153
> #14 0x00007f983ca0575f in std::function<void (kudu::Status
> const&)>::operator()(kudu::Status const&) const (__args#0=...,
> this=0x7f967bd8bad0) at
> /mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/std_function.h:617
> #15
> kudu::client::internal::MetaCache::LookupTabletByKey(kudu::client::KuduTable
> const*, kudu::PartitionKey, kudu::MonoTime const&,
> kudu::client::internal::MetaCache::LookupType,
> scoped_refptr<kudu::client::internal::RemoteTablet>*, std::function<void
> (kudu::Status const&)> const&) (this=0xde431e0, table=0xf899300,
> partition_key=..., deadline=...,
> lookup_type=kudu::client::internal::MetaCache::LookupType::kPoint,
> remote_tablet=0x0, callback=...) at
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/client/meta_cache.cc:1408
> #16 0x00007f983ca0598c in
> kudu::client::internal::MetaCacheServerPicker::PickLeader(std::function<void
> (kudu::Status const&, kudu::client::internal::RemoteTabletServer*)> const&,
> kudu::MonoTime const&) (this=0xdec0000, callback=..., deadline=...)
> at /mnt/source/kudu/kudu-345fd44ca3/src/kudu/common/partition.h:153
> #17 0x00007f983ca0575f in std::function<void (kudu::Status
> const&)>::operator()(kudu::Status const&) const (__args#0=...,
> this=0x7f967bd8bce0) at
> /mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/std_function.h:617
> #18
> kudu::client::internal::MetaCache::LookupTabletByKey(kudu::client::KuduTable
> const*, kudu::PartitionKey, kudu::MonoTime const&,
> kudu::client::internal::MetaCache::LookupType,
> scoped_refptr<kudu::client::internal::RemoteTablet>*, std::function<void
> (kudu::Status const&)> const&) (this=0xde431e0, table=0xf899300,
> partition_key=..., deadline=...,
> lookup_type=kudu::client::internal::MetaCache::LookupType::kPoint,
> remote_tablet=0x0, callback=...) at
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/client/meta_cache.cc:1408
> #19 0x00007f983ca0598c in
> kudu::client::internal::MetaCacheServerPicker::PickLeader(std::function<void
> (kudu::Status const&, kudu::client::internal::RemoteTabletServer*)> const&,
> kudu::MonoTime const&) (this=0xdec0000, callback=..., deadline=...)
> at /mnt/source/kudu/kudu-345fd44ca3/src/kudu/common/partition.h:153
> #20 0x00007f983ca0575f in std::function<void (kudu::Status
> const&)>::operator()(kudu::Status const&) const (__args#0=...,
> this=0x7f967bd8bef0) at
> /mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/std_function.h:617
> #21
> kudu::client::internal::MetaCache::LookupTabletByKey(kudu::client::KuduTable
> const*, kudu::PartitionKey, kudu::MonoTime const&,
> kudu::client::internal::MetaCache::LookupType,
> scoped_refptr<kudu::client::internal::RemoteTablet>*, std::function<void
> (kudu::Status const&)> const&) (this=0xde431e0, table=0xf899300,
> partition_key=..., deadline=...,
> lookup_type=kudu::client::internal::MetaCache::LookupType::kPoint,
> remote_tablet=0x0, callback=...) at
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/client/meta_cache.cc:1408
> #22 0x00007f983ca0598c in
> kudu::client::internal::MetaCacheServerPicker::PickLeader(std::function<void
> (kudu::Status const&, kudu::client::internal::RemoteTabletServer*)> const&,
> kudu::MonoTime const&) (this=0xdec0000, callback=..., deadline=...)
> at /mnt/source/kudu/kudu-345fd44ca3/src/kudu/common/partition.h:153
> #23 0x00007f983ca0575f in std::function<void (kudu::Status
> const&)>::operator()(kudu::Status const&) const (__args#0=...,
> this=0x7f967bd8c100) at
> /mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/std_function.h:617
> #24
> kudu::client::internal::MetaCache::LookupTabletByKey(kudu::client::KuduTable
> const*, kudu::PartitionKey, kudu::MonoTime const&,
> kudu::client::internal::MetaCache::LookupType,
> scoped_refptr<kudu::client::internal::RemoteTablet>*, std::function<void
> (kudu::Status const&)> const&) (this=0xde431e0, table=0xf899300,
> partition_key=..., deadline=...,
> lookup_type=kudu::client::internal::MetaCache::LookupType::kPoint,
> remote_tablet=0x0, callback=...) at
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/client/meta_cache.cc:1408
> #25 0x00007f983ca0598c in
> kudu::client::internal::MetaCacheServerPicker::PickLeader(std::function<void
> (kudu::Status const&, kudu::client::internal::RemoteTabletServer*)> const&,
> kudu::MonoTime const&) (this=0xdec0000, callback=..., deadline=...)
> at /mnt/source/kudu/kudu-345fd44ca3/src/kudu/common/partition.h:153
> #26 0x00007f983ca0575f in std::function<void (kudu::Status
> const&)>::operator()(kudu::Status const&) const (__args#0=...,
> this=0x7f967bd8c310) at
> /mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/std_function.h:617
> #27
> kudu::client::internal::MetaCache::LookupTabletByKey(kudu::client::KuduTable
> const*, kudu::PartitionKey, kudu::MonoTime const&,
> kudu::client::internal::MetaCache::LookupType,
> scoped_refptr<kudu::client::internal::RemoteTablet>*, std::function<void
> (kudu::Status const&)> const&) (this=0xde431e0, table=0xf899300,
> partition_key=..., deadline=...,
> lookup_type=kudu::client::internal::MetaCache::LookupType::kPoint,
> remote_tablet=0x0, callback=...) at
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/client/meta_cache.cc:1408
> ... continues ...
> #47617 0x00007f983ca0598c in
> kudu::client::internal::MetaCacheServerPicker::PickLeader(std::function<void
> (kudu::Status const&, kudu::client::internal::RemoteTabletServer*)> const&,
> kudu::MonoTime const&) (this=0xdec0000, callback=..., deadline=...)
> at /mnt/source/kudu/kudu-345fd44ca3/src/kudu/common/partition.h:153
> #47618 0x00007f983ca0575f in std::function<void (kudu::Status
> const&)>::operator()(kudu::Status const&) const (__args#0=...,
> this=0x7f967c589290) at
> /mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/std_function.h:617
> #47619
> kudu::client::internal::MetaCache::LookupTabletByKey(kudu::client::KuduTable
> const*, kudu::PartitionKey, kudu::MonoTime const&,
> kudu::client::internal::MetaCache::LookupType,
> scoped_refptr<kudu::client::internal::RemoteTablet>*, std::function<void
> (kudu::Status const&)> const&) (this=0xde431e0, table=0xf899300,
> partition_key=..., deadline=...,
> lookup_type=kudu::client::internal::MetaCache::LookupType::kPoint,
> remote_tablet=0x0, callback=...) at
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/client/meta_cache.cc:1408
> #47620 0x00007f983ca0598c in
> kudu::client::internal::MetaCacheServerPicker::PickLeader(std::function<void
> (kudu::Status const&, kudu::client::internal::RemoteTabletServer*)> const&,
> kudu::MonoTime const&) (this=0xdec0000, callback=..., deadline=...)
> at /mnt/source/kudu/kudu-345fd44ca3/src/kudu/common/partition.h:153
> --Type <RET> for more, q to quit, c to continue without paging--
> #47621 0x00007f983ca0575f in std::function<void (kudu::Status
> const&)>::operator()(kudu::Status const&) const (__args#0=...,
> this=0x7f967c5894a0) at
> /mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/std_function.h:617
> #47622
> kudu::client::internal::MetaCache::LookupTabletByKey(kudu::client::KuduTable
> const*, kudu::PartitionKey, kudu::MonoTime const&,
> kudu::client::internal::MetaCache::LookupType,
> scoped_refptr<kudu::client::internal::RemoteTablet>*, std::function<void
> (kudu::Status const&)> const&) (this=0xde431e0, table=0xf899300,
> partition_key=..., deadline=...,
> lookup_type=kudu::client::internal::MetaCache::LookupType::kPoint,
> remote_tablet=0x0, callback=...) at
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/client/meta_cache.cc:1408
> #47623 0x00007f983ca0598c in
> kudu::client::internal::MetaCacheServerPicker::PickLeader(std::function<void
> (kudu::Status const&, kudu::client::internal::RemoteTabletServer*)> const&,
> kudu::MonoTime const&) (this=0xdec0000, callback=..., deadline=...)
> at /mnt/source/kudu/kudu-345fd44ca3/src/kudu/common/partition.h:153
> #47624 0x00007f983ca066a7 in std::function<void (kudu::Status
> const&)>::operator()(kudu::Status const&) const (__args#0=...,
> this=0xca50918) at
> /mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/std_function.h:617#47625
> kudu::client::internal::LookupRpc::SendRpcCb (this=0xca50800, status=...) at
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/client/meta_cache.cc:966
> #47626 0x00007f983c9db65c in
> kudu::client::internal::AsyncLeaderMasterRpc<kudu::master::GetTableLocationsRequestPB,
>
> kudu::master::GetTableLocationsResponsePB>::SendRpc()::{lambda()#1}::operator()()
> const (this=<optimized out>, this=<optimized out>)
> at /mnt/source/kudu/kudu-345fd44ca3/src/kudu/util/status.h:230#47627
> std::__invoke_impl<void,
> kudu::client::internal::AsyncLeaderMasterRpc<kudu::master::GetTableLocationsRequestPB,
>
> kudu::master::GetTableLocationsResponsePB>::SendRpc()::{lambda()#1}&>(std::__invoke_other,
>
> kudu::client::internal::AsyncLeaderMasterRpc<kudu::master::GetTableLocationsRequestPB,
> kudu::master::GetTableLocationsResponsePB>::SendRpc()::{lambda()#1}&)
> (__f=...) at /mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/invoke.h:60
> #47628 std::__invoke_r<void,
> kudu::client::internal::AsyncLeaderMasterRpc<kudu::master::GetTableLocationsRequestPB,
>
> kudu::master::GetTableLocationsResponsePB>::SendRpc()::{lambda()#1}&>(void&&,
> (kudu::client::internal::AsyncLeaderMasterRpc<kudu::master::GetTableLocationsRequestPB,
> kudu::master::GetTableLocationsResponsePB>::SendRpc()::{lambda()#1}&)...)
> (__fn=...) at /mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/invoke.h:110
> #47629 std::_Function_handler<void (),
> kudu::client::internal::AsyncLeaderMasterRpc<kudu::master::GetTableLocationsRequestPB,
>
> kudu::master::GetTableLocationsResponsePB>::SendRpc()::{lambda()#1}>::_M_invoke(std::_Any_data
> const&) (__functor=...)
> at /mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/std_function.h:291
> #47630 0x00007f983cac860b in std::function<void ()>::operator()() const
> (this=0xee3f9c0) at
> /mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/std_function.h:617
> #47631 kudu::rpc::OutboundCall::CallCallback (this=0xee3f840) at
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/rpc/outbound_call.cc:309
> #47632 0x00007f983cabb763 in kudu::rpc::Connection::HandleCallResponse
> (this=0xcd00700, transfer=...) at
> /mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/unique_ptr.h:172
> #47633 0x00007f983cabc215 in kudu::rpc::Connection::ReadHandler
> (this=0xcd00700, watcher=..., revents=<optimized out>) at
> /mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/unique_ptr.h:172#47634
> 0x00007f983cdb3ffb in ev_invoke_pending (loop=0xcc99b00) at
> /mnt/source/kudu/kudu-345fd44ca3/thirdparty/src/libev-4.20/ev.c:3155
> #47635 0x00007f983ca97cc8 in kudu::rpc::ReactorThread::InvokePendingCb
> (loop=0xcc99b00) at
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/rpc/reactor.cc:202
> #47636 0x00007f983cdb73f7 in ev_run (flags=0, loop=0xcc99b00) at
> /mnt/source/kudu/kudu-345fd44ca3/thirdparty/src/libev-4.20/ev.c:3555
> #47637 ev_run (loop=0xcc99b00, flags=0) at
> /mnt/source/kudu/kudu-345fd44ca3/thirdparty/src/libev-4.20/ev.c:3402
> #47638 0x00007f983ca98bd9 in ev::loop_ref::run (flags=0, this=0xef75be0) at
> /mnt/source/kudu/kudu-345fd44ca3/thirdparty/installed/uninstrumented/include/ev++.h:211#47639
> kudu::rpc::ReactorThread::RunThread (this=0xef75bd8) at
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/rpc/reactor.cc:503
> #47640 0x00007f983cc2d36c in std::function<void ()>::operator()() const
> (this=0xec68358) at
> /mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/std_function.h:617
> #47641 kudu::Thread::SuperviseThread (arg=0xec68300) at
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/util/thread.cc:691
> #47642 0x00007f983dfec609 in start_thread (arg=<optimized out>) at
> pthread_create.c:477
> #47643 0x00007f983c01c133 in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95{noformat}
> It hits a SIGSEGV because the stack gets blown out.
> Here are the steps to reproduce it from Impala:
> {noformat}
> /** 1. Create table **/
> drop table if exists impala_crash;
> create table if not exists impala_crash (
> dt string,
> col string,
> primary key(dt)
> )
> partition by range(dt) (
> partition values <= '00000000'
> )
> stored as kudu;/** 2. alter and insert **/
> alter table impala_crash drop if exists range partition value='20230301';
> alter table impala_crash add if not exists range partition value='20230301';
> insert into impala_crash values ('20230301','abc');
> /* normal *//** 3. Run the same queries again and impala daemon crashes **/
> alter table impala_crash drop if exists range partition value='20230301';
> alter table impala_crash add if not exists range partition value='20230301';
> insert into impala_crash values ('20230301','abc');{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)