[
https://issues.apache.org/jira/browse/KUDU-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18012653#comment-18012653
]
ASF subversion and git services commented on KUDU-2439:
-------------------------------------------------------
Commit bda99a38bb7da12959a99ffc9a79de02acdacf2a in kudu's branch
refs/heads/master from Alexey Serbin
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=bda99a38b ]
KUDU-3635 call OPENSSL_cleanup() explicitly
With this changelist, the global state of the OpenSSL library is now
explicitly cleaned up when shutting down a process. The rationale
for this is outlined in [1].
This is applicable to the code linked against OpenSSL library of
versions 1.1.1 and newer. IIUC, that covers all the contemporary Linux
distributions at the time of writing. It worked for me at EOL Ubuntu
18.04 LTS as well (in particular, I tested it on Ubuntu 18.04.1 LTS).
As for testing, I verified that the issue is gone after observing its
manifestation with the frequency of about 1 in 10 runs of the kudu CLI
tool with RELEASE bits on RHEL8.8 and RHEL9.2 (both of x86_64 arch)
without the patch. For reproduction and verification, I ran the
following 100rep test run multiple times (and 1000rep just for
verification):
./bin/kudu-tool-test --gtest_repeat=100
Without the patch, I saw many core files left by the kudu CLI binary
during every 100rep run, where many of the core files had stack traces
similar to the one described in the JIRA item. With this patch,
no such core files were observed when running Kudu RELEASE bits.
However, there were still crashes with core files having stack traces
attributable to KUDU-2439. That's addressed in a follow-up patch.
[1]
https://developers.redhat.com/articles/2022/10/31/best-practices-application-shutdown-openssl
Change-Id: Ib08310d66a7eabb1996bde901f39f36f54bff483
Reviewed-on: http://gerrit.cloudera.org:8080/23222
Tested-by: Alexey Serbin <[email protected]>
Reviewed-by: Abhishek Chennaka <[email protected]>
> There's no way to safely clean up a KuduClient or Messenger
> -----------------------------------------------------------
>
> Key: KUDU-2439
> URL: https://issues.apache.org/jira/browse/KUDU-2439
> Project: Kudu
> Issue Type: Bug
> Components: client, rpc
> Affects Versions: 1.8.0
> Reporter: Adar Dembo
> Priority: Major
> Attachments: kudu-admin-test.7.txt.xz
>
>
> KuduClient has shared ownership, and its only "shutdown" knob is to drop its
> last ref. This drops the last ref on the underlying Messenger object, but
> Messenger itself has a funky "internal" vs. "external" ref system, and
> destroying the KuduClient only drops the last external ref. The Messenger is
> only destroyed when the last internal ref is dropped, and that only happens
> when outstanding reactor threads finish whatever processing they were busy
> doing. So, there's no way for a user to "destroy this KuduClient and wait for
> all outstanding resources to be cleaned up".
> Why is this important? For one, there's a known data race with outstanding
> work done by the KuduClient's DnsResolver. For two, there's a similar issue
> with OpenSSL. OpenSSL 1.1 registers an atexit() handler that tears down its
> global library state. For this to be safe, all allocated OpenSSL resources
> must have been released by the time atexit() handlers run (which is to say,
> all OpenSSL resources must be released by the time main() returns or exit()
> is called). Because we can't wait on the KuduClient to destroy itself and its
> OpenSSL resources, applications using the KuduClient may run the atexit()
> handler at an unsafe time.
> Here's the TSAN output for the OpenSSL data race:. It's trivial to reproduce
> via reactor-test once the appropriate suppression is removed from
> tsan-suppressions.txt:
> {noformat}
> WARNING: ThreadSanitizer: data race (pid=7914)
> Write of size 1 at 0x7b1000004340 by main thread:
> #0 pthread_rwlock_destroy
> /home/adar/Source/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1313
> (reactor-test+0x48205e)
> #1 CRYPTO_THREAD_lock_free
> /home/adar/openssl/openssl-1.1.0g/build_shared/../crypto/threads_pthread.c:95
> (libcrypto.so.1.1+0x20d0e5)
> Previous atomic read of size 1 at 0x7b1000004340 by thread T16:
> #0 pthread_rwlock_wrlock
> /home/adar/Source/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1352
> (reactor-test+0x45ffe3)
> #1 CRYPTO_THREAD_write_lock
> /home/adar/openssl/openssl-1.1.0g/build_shared/../crypto/threads_pthread.c:66
> (libcrypto.so.1.1+0x20d08a)
> #2 std::__1::__function::__func<void (*)(ssl_ctx_st*),
> std::__1::allocator<void (*)(ssl_ctx_st*)>, void
> (ssl_ctx_st*)>::operator()(ssl_ctx_st*&&)
> /home/adar/Source/kudu/thirdparty/installed/tsan/include/c++/v1/functional:1562:12
> (libsecurity.so+0x56b14)
> #3 std::__1::function<void (ssl_ctx_st*)>::operator()(ssl_ctx_st*) const
> /home/adar/Source/kudu/thirdparty/installed/tsan/include/c++/v1/functional:1916:12
> (libkrpc.so+0xb139d)
> #4 std::__1::unique_ptr<ssl_ctx_st, std::__1::function<void (ssl_ctx_st*)>
> >::reset(ssl_ctx_st*)
> /home/adar/Source/kudu/thirdparty/installed/tsan/include/c++/v1/memory:2598:7
> (libkrpc.so+0xb0f9e)
> #5 std::__1::unique_ptr<ssl_ctx_st, std::__1::function<void (ssl_ctx_st*)>
> >::~unique_ptr()
> /home/adar/Source/kudu/thirdparty/installed/tsan/include/c++/v1/memory:2552
> (libkrpc.so+0xb0f9e)
> #6 kudu::security::TlsContext::~TlsContext()
> /home/adar/Source/kudu/build/tsan/../../src/kudu/security/tls_context.h:77
> (libkrpc.so+0xb0f9e)
> #7
> std::__1::default_delete<kudu::security::TlsContext>::operator()(kudu::security::TlsContext*)
> const
> /home/adar/Source/kudu/thirdparty/installed/tsan/include/c++/v1/memory:2285:5
> (libkrpc.so+0xab32c)
> #8 std::__1::unique_ptr<kudu::security::TlsContext,
> std::__1::default_delete<kudu::security::TlsContext>
> >::reset(kudu::security::TlsContext*)
> /home/adar/Source/kudu/thirdparty/installed/tsan/include/c++/v1/memory:2598
> (libkrpc.so+0xab32c)
> #9 std::__1::unique_ptr<kudu::security::TlsContext,
> std::__1::default_delete<kudu::security::TlsContext> >::~unique_ptr()
> /home/adar/Source/kudu/thirdparty/installed/tsan/include/c++/v1/memory:2552
> (libkrpc.so+0xab32c)
> #10 kudu::rpc::Messenger::~Messenger()
> /home/adar/Source/kudu/build/tsan/../../src/kudu/rpc/messenger.cc:433
> (libkrpc.so+0xab32c)
> #11
> std::__1::default_delete<kudu::rpc::Messenger>::operator()(kudu::rpc::Messenger*)
> const
> /home/adar/Source/kudu/thirdparty/installed/tsan/include/c++/v1/memory:2285:5
> (libkrpc.so+0xb0ddb)
> #12 std::__1::__shared_ptr_pointer<kudu::rpc::Messenger*,
> std::__1::default_delete<kudu::rpc::Messenger>,
> std::__1::allocator<kudu::rpc::Messenger> >::__on_zero_shared()
> /home/adar/Source/kudu/thirdparty/installed/tsan/include/c++/v1/memory:3586
> (libkrpc.so+0xb0ddb)
> #13 std::__1::__shared_count::__release_shared()
> /home/adar/Source/kudu/thirdparty/installed/tsan/include/c++/v1/memory:3490:9
> (reactor-test+0x4d0a2e)
> #14 std::__1::__shared_weak_count::__release_shared()
> /home/adar/Source/kudu/thirdparty/installed/tsan/include/c++/v1/memory:3532
> (reactor-test+0x4d0a2e)
> #15 std::__1::shared_ptr<kudu::rpc::Messenger>::~shared_ptr()
> /home/adar/Source/kudu/thirdparty/installed/tsan/include/c++/v1/memory:4468
> (reactor-test+0x4d0a2e)
> #16 std::__1::shared_ptr<kudu::rpc::Messenger>::reset()
> /home/adar/Source/kudu/thirdparty/installed/tsan/include/c++/v1/memory:4603:5
> (libkrpc.so+0xbefd1)
> #17 kudu::rpc::ReactorThread::RunThread()
> /home/adar/Source/kudu/build/tsan/../../src/kudu/rpc/reactor.cc:482
> (libkrpc.so+0xbefd1)
> #18 boost::_mfi::mf0<void,
> kudu::rpc::ReactorThread>::operator()(kudu::rpc::ReactorThread*) const
> /home/adar/Source/kudu/build/tsan/../../thirdparty/installed/tsan/include/boost/bind/mem_fn_template.hpp:49:29
> (libkrpc.so+0xc8c69)
> #19 void boost::_bi::list1<boost::_bi::value<kudu::rpc::ReactorThread*>
> >::operator()<boost::_mfi::mf0<void, kudu::rpc::ReactorThread>,
> boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf0<void,
> kudu::rpc::ReactorThread>&, boost::_bi::list0&, int)
> /home/adar/Source/kudu/build/tsan/../../thirdparty/installed/tsan/include/boost/bind/bind.hpp:259:9
> (libkrpc.so+0xc8bba)
> #20 boost::_bi::bind_t<void, boost::_mfi::mf0<void,
> kudu::rpc::ReactorThread>,
> boost::_bi::list1<boost::_bi::value<kudu::rpc::ReactorThread*> >
> >::operator()()
> /home/adar/Source/kudu/build/tsan/../../thirdparty/installed/tsan/include/boost/bind/bind.hpp:1222:16
> (libkrpc.so+0xc8b43)
> #21
> boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void,
> boost::_mfi::mf0<void, kudu::rpc::ReactorThread>,
> boost::_bi::list1<boost::_bi::value<kudu::rpc::ReactorThread*> > >,
> void>::invoke(boost::detail::function::function_buffer&)
> /home/adar/Source/kudu/build/tsan/../../thirdparty/installed/tsan/include/boost/function/function_template.hpp:159:11
> (libkrpc.so+0xc8939)
> #22 boost::function0<void>::operator()() const
> /home/adar/Source/kudu/build/tsan/../../thirdparty/installed/tsan/include/boost/function/function_template.hpp:770:14
> (libkrpc.so+0xb8a21)
> #23 kudu::Thread::SuperviseThread(void*)
> /home/adar/Source/kudu/build/tsan/../../src/kudu/util/thread.cc:603:3
> (libkudu_util.so+0x1c1264)
> Location is heap block of size 56 at 0x7b1000004340 allocated by main thread:
> #0 malloc
> /home/adar/Source/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:666
> (reactor-test+0x47cc9c)
> #1 CRYPTO_malloc
> /home/adar/openssl/openssl-1.1.0g/build_shared/../crypto/mem.c:92
> (libcrypto.so.1.1+0x1a6327)
> #2 CRYPTO_THREAD_run_once
> /home/adar/openssl/openssl-1.1.0g/build_shared/../crypto/threads_pthread.c:106
> (libcrypto.so.1.1+0x20d126)
> #3 CRYPTO_THREAD_run_once
> /home/adar/openssl/openssl-1.1.0g/build_shared/../crypto/threads_pthread.c:106
> (libcrypto.so.1.1+0x20d126)
> #4 kudu::rpc::Messenger::Init()
> /home/adar/Source/kudu/build/tsan/../../src/kudu/rpc/messenger.cc:444:3
> (libkrpc.so+0xa9589)
> #5
> kudu::rpc::MessengerBuilder::Build(std::__1::shared_ptr<kudu::rpc::Messenger>*)
> /home/adar/Source/kudu/build/tsan/../../src/kudu/rpc/messenger.cc:205:3
> (libkrpc.so+0xa903d)
> #6 kudu::rpc::RpcTestBase::CreateMessenger(std::__1::basic_string<char,
> std::__1::char_traits<char>, std::__1::allocator<char> > const&,
> std::__1::shared_ptr<kudu::rpc::Messenger>*, int, bool,
> std::__1::basic_string<char, std::__1::char_traits<char>,
> std::__1::allocator<char> > const&, std::__1::basic_string<char,
> std::__1::char_traits<char>, std::__1::allocator<char> > const&,
> std::__1::basic_string<char, std::__1::char_traits<char>,
> std::__1::allocator<char> > const&, std::__1::basic_string<char,
> std::__1::char_traits<char>, std::__1::allocator<char> > const&)
> /home/adar/Source/kudu/build/tsan/../../src/kudu/rpc/rpc-test-base.h:462:16
> (reactor-test+0x4d3412)
> #7 kudu::rpc::ReactorTest::SetUp()
> /home/adar/Source/kudu/build/tsan/../../src/kudu/rpc/reactor-test.cc:46:5
> (reactor-test+0x4d02ae)
> #8 void
> testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test,
> void>(testing::Test*, void (testing::Test::*)(), char const*)
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2402:10
> (libgmock.so+0x551df)
> #9 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test,
> void>(testing::Test*, void (testing::Test::*)(), char const*)
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2438
> (libgmock.so+0x551df)
> #10 testing::Test::Run()
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2470:3
> (libgmock.so+0x342b1)
> #11 testing::TestInfo::Run()
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2656:11
> (libgmock.so+0x3563c)
> #12 testing::TestCase::Run()
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2774:28
> (libgmock.so+0x36116)
> #13 testing::internal::UnitTestImpl::RunAllTests()
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:4649:43
> (libgmock.so+0x424ea)
> #14 bool
> testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl,
> bool>(testing::internal::UnitTestImpl*, bool
> (testing::internal::UnitTestImpl::*)(), char const*)
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2402:10
> (libgmock.so+0x5614f)
> #15 bool
> testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl,
> bool>(testing::internal::UnitTestImpl*, bool
> (testing::internal::UnitTestImpl::*)(), char const*)
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2438
> (libgmock.so+0x5614f)
> #16 testing::UnitTest::Run()
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:4257:10
> (libgmock.so+0x41dd2)
> #17 RUN_ALL_TESTS()
> /home/adar/Source/kudu/build/tsan/../../thirdparty/installed/tsan/include/gtest/gtest.h:2233:46
> (libkudu_test_main.so+0x33fb)
> #18 main
> /home/adar/Source/kudu/build/tsan/../../src/kudu/util/test_main.cc:106:13
> (libkudu_test_main.so+0x2bb6)
> Thread T16 'rpc reactor-793' (tid=7934, finished) created by main thread at:
> #0 pthread_create
> /home/adar/Source/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:992
> (reactor-test+0x45f7d6)
> #1 kudu::Thread::StartThread(std::__1::basic_string<char,
> std::__1::char_traits<char>, std::__1::allocator<char> > const&,
> std::__1::basic_string<char, std::__1::char_traits<char>,
> std::__1::allocator<char> > const&, boost::function<void ()> const&, unsigned
> long, scoped_refptr<kudu::Thread>*)
> /home/adar/Source/kudu/build/tsan/../../src/kudu/util/thread.cc:556:15
> (libkudu_util.so+0x1c0c8f)
> #2 kudu::Status kudu::Thread::Create<void (kudu::rpc::ReactorThread::*)(),
> kudu::rpc::ReactorThread*>(std::__1::basic_string<char,
> std::__1::char_traits<char>, std::__1::allocator<char> > const&,
> std::__1::basic_string<char, std::__1::char_traits<char>,
> std::__1::allocator<char> > const&, void (kudu::rpc::ReactorThread::*
> const&)(), kudu::rpc::ReactorThread* const&, scoped_refptr<kudu::Thread>*)
> /home/adar/Source/kudu/build/tsan/../../src/kudu/util/thread.h:164:12
> (libkrpc.so+0xc4095)
> #3 kudu::rpc::ReactorThread::Init()
> /home/adar/Source/kudu/build/tsan/../../src/kudu/rpc/reactor.cc:168:10
> (libkrpc.so+0xbeace)
> #4 kudu::rpc::Reactor::Init()
> /home/adar/Source/kudu/build/tsan/../../src/kudu/rpc/reactor.cc:728:18
> (libkrpc.so+0xc2f81)
> #5 kudu::rpc::Messenger::Init()
> /home/adar/Source/kudu/build/tsan/../../src/kudu/rpc/messenger.cc:446:5
> (libkrpc.so+0xa95e2)
> #6
> kudu::rpc::MessengerBuilder::Build(std::__1::shared_ptr<kudu::rpc::Messenger>*)
> /home/adar/Source/kudu/build/tsan/../../src/kudu/rpc/messenger.cc:205:3
> (libkrpc.so+0xa903d)
> #7 kudu::rpc::RpcTestBase::CreateMessenger(std::__1::basic_string<char,
> std::__1::char_traits<char>, std::__1::allocator<char> > const&,
> std::__1::shared_ptr<kudu::rpc::Messenger>*, int, bool,
> std::__1::basic_string<char, std::__1::char_traits<char>,
> std::__1::allocator<char> > const&, std::__1::basic_string<char,
> std::__1::char_traits<char>, std::__1::allocator<char> > const&,
> std::__1::basic_string<char, std::__1::char_traits<char>,
> std::__1::allocator<char> > const&, std::__1::basic_string<char,
> std::__1::char_traits<char>, std::__1::allocator<char> > const&)
> /home/adar/Source/kudu/build/tsan/../../src/kudu/rpc/rpc-test-base.h:462:16
> (reactor-test+0x4d3412)
> #8 kudu::rpc::ReactorTest::SetUp()
> /home/adar/Source/kudu/build/tsan/../../src/kudu/rpc/reactor-test.cc:46:5
> (reactor-test+0x4d02ae)
> #9 void
> testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test,
> void>(testing::Test*, void (testing::Test::*)(), char const*)
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2402:10
> (libgmock.so+0x551df)
> #10 void
> testing::internal::HandleExceptionsInMethodIfSupported<testing::Test,
> void>(testing::Test*, void (testing::Test::*)(), char const*)
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2438
> (libgmock.so+0x551df)
> #11 testing::Test::Run()
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2470:3
> (libgmock.so+0x342b1)
> #12 testing::TestInfo::Run()
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2656:11
> (libgmock.so+0x3563c)
> #13 testing::TestCase::Run()
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2774:28
> (libgmock.so+0x36116)
> #14 testing::internal::UnitTestImpl::RunAllTests()
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:4649:43
> (libgmock.so+0x424ea)
> #15 bool
> testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl,
> bool>(testing::internal::UnitTestImpl*, bool
> (testing::internal::UnitTestImpl::*)(), char const*)
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2402:10
> (libgmock.so+0x5614f)
> #16 bool
> testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl,
> bool>(testing::internal::UnitTestImpl*, bool
> (testing::internal::UnitTestImpl::*)(), char const*)
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2438
> (libgmock.so+0x5614f)
> #17 testing::UnitTest::Run()
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:4257:10
> (libgmock.so+0x41dd2)
> #18 RUN_ALL_TESTS()
> /home/adar/Source/kudu/build/tsan/../../thirdparty/installed/tsan/include/gtest/gtest.h:2233:46
> (libkudu_test_main.so+0x33fb)
> #19 main
> /home/adar/Source/kudu/build/tsan/../../src/kudu/util/test_main.cc:106:13
> (libkudu_test_main.so+0x2bb6)
> SUMMARY: ThreadSanitizer: data race
> /home/adar/openssl/openssl-1.1.0g/build_shared/../crypto/threads_pthread.c:95
> in CRYPTO_THREAD_lock_free
> {noformat}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)