[security] avoid kerberos ticket renewal and only reacquire

It was found that if we use a file based credential cache that is
shared between the C++ side and the java side of a process, and we
encounter the specific edge case where we renew a ticket that has
less than 'ticket_lifetime' left before its 'renew_lifetime' expires,
the ticket is set to have a NULL 'renew_till' timestamp.

Eg:
ticket_lifetime = 10m
renew_lifetime = 100m

[current ticket being renewed] at '15:30:00'
endtime = '15:30:30'
renew_till = '15:31:00'

This ticket will be renewed and the renewed ticket will have the
following values:
endtime = '15:31:00'
renew_till = null

The Java krb5 library refuses to read these kinds of tickets which
have the RENEWABLE flag set but no 'renew_till' set, causing
unexpected failures.

Currently, the only way to work around this is to not renew tickets
at all and only always reacquire them. The reason for this is that
the Java side of a process or even another process may be running
its own renewal thread on the same credential cache for the same
principal(s). So even if we were to avoid renewing in this window,
the Java side could renew in this window, causing the above problem.
If we always reacquire the tickets, we're forcefully reseting this
window for that principal, thereby not allowing the Java side to hit
this bug.
The scenario where this bug played out is when using the kudu renewal
code in tandem with a hadoop process that use the same principals.

Also, currently there is no advantage we gain from just renewing the
tickets vs. reacquiring them, either in terms of security or
performance, since we login from a keytab.

Tracked on the Java side by:
http://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8186576

Change-Id: I8e5225de332ba785e3a73014b8418cfd4059fe07
Reviewed-on: http://gerrit.cloudera.org:8080/7810
Reviewed-by: Todd Lipcon <[email protected]>
Tested-by: Kudu Jenkins
Reviewed-on: http://gerrit.cloudera.org:8080/7898
Reviewed-by: Sailesh Mukil <[email protected]>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/2774d80f
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/2774d80f
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/2774d80f

Branch: refs/heads/master
Commit: 2774d80f6b604a4e6256606c6f50c9d21f715fa4
Parents: 64e2802
Author: Sailesh Mukil <[email protected]>
Authored: Mon Aug 21 22:05:39 2017 -0700
Committer: Impala Public Jenkins <[email protected]>
Committed: Sat Sep 2 03:22:00 2017 +0000

----------------------------------------------------------------------
 be/src/kudu/security/init.cc | 82 +++++++++++++--------------------------
 be/src/kudu/security/init.h  |  4 +-
 2 files changed, 29 insertions(+), 57 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/2774d80f/be/src/kudu/security/init.cc
----------------------------------------------------------------------
diff --git a/be/src/kudu/security/init.cc b/be/src/kudu/security/init.cc
index aff20e9..9678373 100644
--- a/be/src/kudu/security/init.cc
+++ b/be/src/kudu/security/init.cc
@@ -68,12 +68,12 @@ class KinitContext;
 // Global context for usage of the Krb5 library.
 krb5_context g_krb5_ctx;
 
-// Global instance of the context used by the kinit/renewal thread.
+// Global instance of the context used by the kinit/reacquire thread.
 KinitContext* g_kinit_ctx;
 
-// This lock is used to avoid a race while renewing the kerberos ticket.
+// This lock is used to avoid a race while reacquiring the kerberos ticket.
 // The race can occur between the time we reinitialize the cache and the
-// time when we actually store the renewed credential back in the cache.
+// time when we actually store the new credentials back in the cache.
 RWMutex* g_kerberos_reinit_lock;
 
 class KinitContext {
@@ -184,15 +184,15 @@ void RenewThread() {
 int32_t KinitContext::GetNextRenewInterval(uint32_t num_retries) {
   int32_t time_remaining = ticket_end_timestamp_ - time(nullptr);
 
-  // If the last ticket renewal was a failure, we back off our retry attempts 
exponentially.
+  // If the last ticket reacqusition was a failure, we back off our retry 
attempts exponentially.
   if (num_retries > 0) return GetBackedOffRenewInterval(time_remaining, 
num_retries);
 
   // If the time remaining between now and ticket expiry is:
-  // * > 10 minutes:   We attempt to renew the ticket between 5 seconds and 5 
minutes before the
+  // * > 10 minutes:   We attempt to reacquire the ticket between 5 seconds 
and 5 minutes before the
   //                   ticket expires.
-  // * 5 - 10 minutes: We attempt to renew the ticket betwen 5 seconds and 1 
minute before the
+  // * 5 - 10 minutes: We attempt to reacquire the ticket betwen 5 seconds and 
1 minute before the
   //                   ticket expires.
-  // * < 5 minutes:    Attempt to renew the ticket every 'time_remaining'.
+  // * < 5 minutes:    Attempt to reacquire the ticket every 'time_remaining'.
   // The jitter is added to make sure that every server doesn't flood the KDC 
at the same time.
   random_device rd;
   mt19937 generator(rd());
@@ -266,7 +266,7 @@ Status KinitContext::DoRenewal() {
         krb5_free_cred_contents(g_krb5_ctx, &creds); });
     if (krb5_is_config_principal(g_krb5_ctx, creds.server)) continue;
 
-    // We only want to renew the TGT (Ticket Granting Ticket). Ignore all 
other tickets.
+    // We only want to reacquire the TGT (Ticket Granting Ticket). Ignore all 
other tickets.
     // This follows the same format as is_local_tgt() from 
krb5:src/clients/klist/klist.c
     if (creds.server->length != 2 ||
         data_eq(creds.server->data[1], principal_->realm) == 0 ||
@@ -275,58 +275,30 @@ Status KinitContext::DoRenewal() {
       continue;
     }
 
-    time_t now = time(nullptr);
-    time_t ticket_expiry = creds.times.endtime;
-    time_t renew_till = creds.times.renew_till;
-    time_t renew_deadline = renew_till - 30;
-
     krb5_creds new_creds;
     memset(&new_creds, 0, sizeof(krb5_creds));
     auto cleanup_new_creds = MakeScopedCleanup([&]() {
         krb5_free_cred_contents(g_krb5_ctx, &new_creds); });
-    // If the ticket has already expired or if there's only a short period 
before which the
-    // renew window closes, we acquire a new ticket.
-    if (ticket_expiry < now || renew_deadline < now) {
-      // Acquire a new ticket using the keytab. This ticket will automatically 
be put into the
-      // credential cache.
-      {
-        std::lock_guard<RWMutex> l(*g_kerberos_reinit_lock);
-        KRB5_RETURN_NOT_OK_PREPEND(krb5_get_init_creds_keytab(g_krb5_ctx, 
&new_creds, principal_,
-                                                              keytab_, 0 /* 
valid from now */,
-                                                              nullptr /* TKT 
service name */,
-                                                              opts_),
-                                   "Reacquire error: unable to login from 
keytab");
+    // Acquire a new ticket using the keytab. This ticket will automatically 
be put into the
+    // credential cache.
+    {
+      std::lock_guard<RWMutex> l(*g_kerberos_reinit_lock);
+      KRB5_RETURN_NOT_OK_PREPEND(krb5_get_init_creds_keytab(g_krb5_ctx, 
&new_creds, principal_,
+                                                            keytab_, 0 /* 
valid from now */,
+                                                            nullptr /* TKT 
service name */,
+                                                            opts_),
+                                 "Reacquire error: unable to login from 
keytab");
 #ifndef HAVE_KRB5_GET_INIT_CREDS_OPT_SET_OUT_CCACHE
-        // Heimdal krb5 doesn't have the 
'krb5_get_init_creds_opt_set_out_ccache' option,
-        // so use this alternate route.
-        KRB5_RETURN_NOT_OK_PREPEND(krb5_cc_initialize(g_krb5_ctx, ccache_, 
principal_),
-                                   "Reacquire error: could not init ccache");
+      // Heimdal krb5 doesn't have the 
'krb5_get_init_creds_opt_set_out_ccache' option,
+      // so use this alternate route.
+      KRB5_RETURN_NOT_OK_PREPEND(krb5_cc_initialize(g_krb5_ctx, ccache_, 
principal_),
+                                 "Reacquire error: could not init ccache");
 
-        KRB5_RETURN_NOT_OK_PREPEND(krb5_cc_store_cred(g_krb5_ctx, ccache_, 
&creds),
-                                   "Reacquire error: could not store creds in 
cache");
+      KRB5_RETURN_NOT_OK_PREPEND(krb5_cc_store_cred(g_krb5_ctx, ccache_, 
&creds),
+                                 "Reacquire error: could not store creds in 
cache");
 #endif
-      }
-      LOG(INFO) << "Successfully reacquired a new kerberos TGT";
-    } else {
-      // Renew existing ticket.
-      KRB5_RETURN_NOT_OK_PREPEND(krb5_get_renewed_creds(g_krb5_ctx, 
&new_creds, principal_,
-                                                        ccache_, nullptr),
-                                 "Failed to renew ticket");
-
-      {
-        // Take the write lock here so that any connections undergoing 
negotiation have to wait
-        // until the new credentials are placed in the cache.
-        std::lock_guard<RWMutex> l(*g_kerberos_reinit_lock);
-        // Clear existing credentials in cache.
-        KRB5_RETURN_NOT_OK_PREPEND(krb5_cc_initialize(g_krb5_ctx, ccache_, 
principal_),
-                                   "Failed to re-initialize ccache");
-
-        // Store the new credentials in the cache.
-        KRB5_RETURN_NOT_OK_PREPEND(krb5_cc_store_cred(g_krb5_ctx, ccache_, 
&new_creds),
-                                   "Failed to store credentials in ccache");
-      }
-      LOG(INFO) << "Successfully renewed kerberos TGT";
     }
+    LOG(INFO) << "Successfully reacquired a new kerberos TGT";
     ticket_end_timestamp_ = new_creds.times.endtime;
     break;
   }
@@ -500,9 +472,9 @@ Status InitKerberosForServer() {
   RETURN_NOT_OK_PREPEND(g_kinit_ctx->Kinit(FLAGS_keytab_file, principal), 
"unable to kinit");
 
   g_kerberos_reinit_lock = new RWMutex(RWMutex::Priority::PREFER_WRITING);
-  scoped_refptr<Thread> renew_thread;
-  // Start the renewal thread.
-  RETURN_NOT_OK(Thread::Create("kerberos", "renewal thread", &RenewThread, 
&renew_thread));
+  scoped_refptr<Thread> reacquire_thread;
+  // Start the reacquire thread.
+  RETURN_NOT_OK(Thread::Create("kerberos", "reacquire thread", &RenewThread, 
&reacquire_thread));
 
   return Status::OK();
 }

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/2774d80f/be/src/kudu/security/init.h
----------------------------------------------------------------------
diff --git a/be/src/kudu/security/init.h b/be/src/kudu/security/init.h
index 61c9577..60a5a5e 100644
--- a/be/src/kudu/security/init.h
+++ b/be/src/kudu/security/init.h
@@ -32,8 +32,8 @@ namespace security {
 Status InitKerberosForServer();
 
 // Returns the process lock 'kerberos_reinit_lock'
-// This lock is acquired in write mode while the ticket is being renewed, and
-// acquired in read mode before using the SASL library which might require a 
ticket.
+// This lock is taken in write mode while the ticket is being reacquired, and
+// taken in read mode before using the SASL library which might require a 
ticket.
 RWMutex* KerberosReinitLock();
 
 // Return the full principal (user/host@REALM) that the server has used to

Reply via email to