[ 
https://issues.apache.org/jira/browse/KUDU-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773571#comment-16773571
 ] 

Michael Ho commented on KUDU-2706:
----------------------------------

Given the apparent lack of thread safety in Kerberos library, should we 
consider adding thread safety in Kudu instead although it could be expensive 
and we seem to be able to get away without any lock for now ?

> Race in CanonicalizeKrb5Principal() due to lazy initialization of 
> g_kinit_ctx->default_realm
> --------------------------------------------------------------------------------------------
>
>                 Key: KUDU-2706
>                 URL: https://issues.apache.org/jira/browse/KUDU-2706
>             Project: Kudu
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 1.8.0
>            Reporter: Michael Ho
>            Assignee: Michael Ho
>            Priority: Critical
>
> As far as I understand, the assumption is that {{g_krb5_ctx}} is global, 
> shared and it should not be modified after initialization. However, various 
> code in {{kudu::security}} make calls to some Kerberos code which may make 
> modification to {{g_krb5_ctx}} inadvertently. 
> The default initialization code {{krb5_init_context(&g_krb5_ctx)}} called by 
> {{kudu::security::InitKrb5Ctx()}} only sets {{g_krb5_ctx->default_realm}} to 
> 0. Upon the first call to {{krb5_parse_name()}}, the Kerberos library will 
> call {{krb5_get_default_realm()}} to get the default realm as realm is 
> {{NULL}}:
> {noformat}
> krb5_error_code KRB5_CALLCONV
> krb5_get_default_realm(krb5_context context, char **realm_out)
> {
>     krb5_error_code ret;
>     *realm_out = NULL;
>     if (context == NULL || context->magic != KV5M_CONTEXT)
>         return KV5M_CONTEXT;
>     if (context->default_realm == NULL) {
>         ret = get_default_realm(context, &context->default_realm); <<<----- 
> // non-thread safe call
>         if (ret)
>             return ret;
>     }
>     *realm_out = strdup(context->default_realm);
>     return (*realm_out == NULL) ? ENOMEM : 0;
> }
> {noformat}
> Apparently, {{krb5_get_default_realm}} may modify {{g_krb5_context}} but it's 
> not thread safe. So, if multiple negotiation threads get into the same code 
> path of calling {{krb5_get_default_realm()}} at the same time, they may end 
> up stepping on each other and corrupting {{g_krb5_ctx}}, leading to the crash 
> seen in stack trace below or error messages like the following:
> {noformat}
> 0216 14:26:07.459600 (+   296us) negotiation.cc:304] Negotiation complete: 
> Runtime error: Server connection negotiation failed: server connection from 
> X.X.X.X:37070: could not canonicalize krb5 principal: could not parse 
> principal: Configuration file does not specify default realm
> {noformat}
> Stack trace showing 
> {noformat}
> #0  0x00007fb03e1fa1f7 in raise () from sysroot/lib64/libc.so.6
> #1  0x00007fb03e1fb8e8 in abort () from sysroot/lib64/libc.so.6
> #2  0x00007fb041159185 in os::abort(bool) () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #3  0x00007fb0412fb593 in VMError::report_and_die() () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #4  0x00007fb04115e68f in JVM_handle_linux_signal () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #5  0x00007fb041154be3 in signalHandler(int, siginfo*, void*) () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #6  <signal handler called>
> #7  0x00000000048d0a53 in 
> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
>  unsigned long, int) ()
> #8  0x00000000048d0aec in 
> tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned 
> long) ()
> #9  0x0000000004a0b4c0 in tc_free ()
> #10 0x00007fb040d32933 in ElfDecoder::demangle(char const*, char*, int) () 
> from sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #11 0x00007fb040d3222a in Decoder::demangle(char const*, char*, int) () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #12 0x00007fb04115695d in os::dll_address_to_function_name(unsigned char*, 
> char*, int, int*) () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #13 0x00007fb040dc0222 in frame::print_C_frame(outputStream*, char*, int, 
> unsigned char*) () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #14 0x00007fb040d2e925 in print_native_stack(outputStream*, frame, Thread*, 
> char*, int) () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #15 0x00007fb0412f9cc8 in VMError::report(outputStream*) () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #16 0x00007fb0412fb18a in VMError::report_and_die() () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #17 0x00007fb04115e68f in JVM_handle_linux_signal () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #18 0x00007fb041154be3 in signalHandler(int, siginfo*, void*) () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #19 <signal handler called>
> #20 0x00000000048d0a53 in 
> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
>  unsigned long, int) ()
> #21 0x00000000048d0aec in 
> tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned 
> long) ()
> #22 0x0000000004a0b4c0 in tc_free ()
> #23 0x00007fb03e5915dd in pthread_attr_destroy () from 
> sysroot/lib64/libpthread.so.0
> #24 0x00007fb04115e49f in current_stack_region(unsigned char**, unsigned 
> long*) () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #25 0x00007fb04115e535 in os::current_stack_base() () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #26 0x00007fb0412faeb4 in VMError::report(outputStream*) () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #27 0x00007fb0412fb18a in VMError::report_and_die() () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #28 0x00007fb04115e68f in JVM_handle_linux_signal () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #29 0x00007fb041154be3 in signalHandler(int, siginfo*, void*) () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #30 <signal handler called>
> #31 0x00000000048d0a53 in 
> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
>  unsigned long, int) ()
> #32 0x00000000048d0aec in 
> tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned 
> long) ()
> #33 0x0000000004a0b4c0 in tc_free ()
> #34 0x00007fb03f051720 in profile_iterator_free () from 
> sysroot/lib64/libkrb5.so.3
> #35 0x00007fb03f0519a4 in profile_get_value () from sysroot/lib64/libkrb5.so.3
> #36 0x00007fb03f051a18 in profile_get_string () from 
> sysroot/lib64/libkrb5.so.3
> #37 0x00007fb03f044dde in profile_default_realm () from 
> sysroot/lib64/libkrb5.so.3
> #38 0x00007fb03f044509 in krb5_get_default_realm () from 
> sysroot/lib64/libkrb5.so.3
> #39 0x00007fb03f0245e8 in krb5_parse_name_flags () from 
> sysroot/lib64/libkrb5.so.3
> #40 0x0000000001ff7bbf in 
> kudu::security::CanonicalizeKrb5Principal(std::string*) ()
> #41 0x00000000026ee4df in 
> kudu::rpc::ServerNegotiation::AuthenticateBySasl(kudu::faststring*) ()
> #42 0x00000000026ea929 in kudu::rpc::ServerNegotiation::Negotiate() ()
> #43 0x000000000271035b in 
> kudu::rpc::DoServerNegotiation(kudu::rpc::Connection*, kudu::TriStateFlag, 
> kudu::TriStateFlag, kudu::MonoTime const&) ()
> #44 0x000000000271070d in 
> kudu::rpc::Negotiation::RunNegotiation(scoped_refptr<kudu::rpc::Connection> 
> const&, kudu::TriStateFlag, kudu::TriStateFlag, kudu::MonoTime) ()
> #45 0x00000000026ca8ab in kudu::internal::RunnableAdapter<void 
> (*)(scoped_refptr<kudu::rpc::Connection> const&, kudu::TriStateFlag, 
> kudu::TriStateFlag, 
> kudu::MonoTime)>::Run(scoped_refptr<kudu::rpc::Connection> const&, 
> kudu::TriStateFlag const&, kudu::TriStateFlag const&, kudu::MonoTime const&) 
> ()
> #46 0x00000000026c9bf4 in kudu::internal::InvokeHelper<false, void, 
> kudu::internal::RunnableAdapter<void (*)(scoped_refptr<kudu::rpc::Connection> 
> const&, kudu::TriStateFlag, kudu::TriStateFlag, ku---Type <return> to 
> continue, or q <return> to quit---
> du::MonoTime)>, void (kudu::rpc::Connection*, kudu::TriStateFlag const&, 
> kudu::TriStateFlag const&, kudu::MonoTime 
> const&)>::MakeItSo(kudu::internal::RunnableAdapter<void 
> (*)(scoped_refptr<kudu::rpc::Connection> const&, kudu::TriStateFlag, 
> kudu::TriStateFlag, kudu::MonoTime)>, kudu::rpc::Connection*, 
> kudu::TriStateFlag const&, kudu::TriStateFlag const&, kudu::MonoTime const&) 
> ()
> #47 0x00000000026c8ad3 in kudu::internal::Invoker<4, 
> kudu::internal::BindState<kudu::internal::RunnableAdapter<void 
> (*)(scoped_refptr<kudu::rpc::Connection> const&, kudu::TriStateFlag, 
> kudu::TriStateFlag, kudu::MonoTime)>, void 
> (scoped_refptr<kudu::rpc::Connection> const&, kudu::TriStateFlag, 
> kudu::TriStateFlag, kudu::MonoTime), void 
> (scoped_refptr<kudu::rpc::Connection>, kudu::TriStateFlag, 
> kudu::TriStateFlag, kudu::MonoTime)>, void 
> (scoped_refptr<kudu::rpc::Connection> const&, kudu::TriStateFlag, 
> kudu::TriStateFlag, kudu::MonoTime)>::Run(kudu::internal::BindStateBase*) ()
> #48 0x0000000001dae84c in kudu::Callback<void ()>::Run() const ()
> #49 0x000000000295a66a in kudu::ClosureRunnable::Run() ()
> #50 0x00000000029595fd in kudu::ThreadPool::DispatchThread() ()
> #51 0x00000000029650d5 in boost::_mfi::mf0<void, 
> kudu::ThreadPool>::operator()(kudu::ThreadPool*) const ()
> #52 0x0000000002964602 in void 
> boost::_bi::list1<boost::_bi::value<kudu::ThreadPool*> 
> >::operator()<boost::_mfi::mf0<void, kudu::ThreadPool>, 
> boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf0<void, 
> kudu::ThreadPool>&, boost::_bi::list0&, int) ()
> #53 0x0000000002963a05 in boost::_bi::bind_t<void, boost::_mfi::mf0<void, 
> kudu::ThreadPool>, boost::_bi::list1<boost::_bi::value<kudu::ThreadPool*> > 
> >::operator()() ()
> #54 0x0000000002962b61 in 
> boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, 
> boost::_mfi::mf0<void, kudu::ThreadPool>, 
> boost::_bi::list1<boost::_bi::value<kudu::ThreadPool*> > >, 
> void>::invoke(boost::detail::function::function_buffer&) ()
> #55 0x0000000001d76514 in boost::function0<void>::operator()() const ()
> #56 0x0000000001d72da2 in kudu::Thread::SuperviseThread(void*) ()
> #57 0x00007fb03e58fe25 in start_thread () from sysroot/lib64/libpthread.so.0
> #58 0x00007fb03e2bd34d in clone () from sysroot/lib64/libc.so.6
> {noformat}
> [~tlipcon] kindly pointed out that someone reported similar issue in Kerberos 
> upstream in the past (http://krbdev.mit.edu/rt/Ticket/Display.html?id=2855).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to