[ 
https://issues.apache.org/jira/browse/KUDU-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057190#comment-17057190
 ] 

ASF subversion and git services commented on KUDU-3050:
-------------------------------------------------------

Commit d74ad32df7e41f6c9b03edb8b7be27706b507c2c in kudu's branch 
refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=d74ad32 ]

KUDU-3050: recover from corrupt kerberos ccache

This handles two failure modes:
* krb5_cc_start_seq_get() can fail if the kerberos credential cache gets
  corrupted on disk, e.g. is truncated.
* the renewal can fail to find a credential in the credential cache,
  either if it is missing or the renewal thread hits an error while
  reading through credentials.

Also add some additional logging and limit the max backoff time
to make it easier to debug other kinds of renewal errors.

The test triggers a pre-existing memory leak bug in some older
Kerberos libraries. Added a suppression for leak sanitizer
to ClientNegotiation::CheckGSSAPI() to suppress it.

Test:
Add a test that exercises the recovery logic after truncating
the credential cache. The test failed before this change.

Change-Id: I2d6e06c3ea65708896a6bf0134cc84838b3f1b58
Reviewed-on: http://gerrit.cloudera.org:8080/15394
Reviewed-by: Adar Dembo <[email protected]>
Tested-by: Kudu Jenkins


> Recover gracefully from corrupt kerberos credential cache
> ---------------------------------------------------------
>
>                 Key: KUDU-3050
>                 URL: https://issues.apache.org/jira/browse/KUDU-3050
>             Project: Kudu
>          Issue Type: Improvement
>          Components: security
>    Affects Versions: 1.11.1
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>            Priority: Major
>             Fix For: 1.12.0
>
>
> This was originally filed as IMPALA-9359, but the code is copied from Kudu.
> The proposed change is to ensure that the kerberos renewal thread (running 
> the RenewThread() function) can recover if the kerberos credential cache is 
> corrupted. We saw this scenario once where /tmp filled up, the cache file was 
> somehow corrupted, and the daemon got wedged, unable to establish connections 
> once its tickets expired.
> I prototyped a fix where it reruns Kinit() to reinitialize the cache when it 
> encounters an error opening the cache.
> We may also want to adjust the backoff algorithm (since it backs off 
> exponentially with no real upper bound) and improve logging so that there is 
> more visibility into how the renewal thread is backing off.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to