[ 
https://issues.apache.org/jira/browse/KUDU-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928517#comment-15928517
 ] 

Alexey Serbin commented on KUDU-1917:
-------------------------------------

The current approach is the following: if KDC is not available when a master or 
tablet server starts, and the cached tickets are expired, the process will 
crash with corresponding message like:

{noformat}
F0316 10:40:09.024931 1964126208 master_main.cc:68] Check failed: _s.ok() Bad 
status: Runtime error: unable to kinit: unable to login from keytab: unable to 
reach any KDC in realm KRBTEST.COM, tried 1 KDC
{noformat}

{noformat}
F0316 10:40:12.902180 1964126208 tablet_server_main.cc:72] Check failed: 
_s.ok() Bad status: Runtime error: unable to kinit: unable to login from 
keytab: unable to reach any KDC in realm KRBTEST.COM, tried 1 KDC
{noformat}

If master/tablet server started up and running, KDC can be restarted in the 
middle.  The KDC can be offline even past ticket TTL -- that's fine.  During 
that KDC offline window, brand new client requests and requests with expired 
Kerberos tickets will fails with corresponding messages containing strings like
{noformat}
Not authorized: Client connection negotiation failed: client connection to 
127.0.0.1:11082: GSSAPI Error:  The context has expire
{noformat}

{noformat}
Not authorized: Could not connect to the cluster: Client connection negotiation 
failed: client connection to 127.0.0.1:11082: Ticket expired
{noformat}

However, master and tablet servers continue running with no issues.

Once KDC is back, client request can be authenticated given Kerberos 
credentials, and the system continue working as expected.

> Add tests which check for KDC fault conditions
> ----------------------------------------------
>
>                 Key: KUDU-1917
>                 URL: https://issues.apache.org/jira/browse/KUDU-1917
>             Project: Kudu
>          Issue Type: Test
>          Components: security, test
>    Affects Versions: 1.3.0
>            Reporter: Todd Lipcon
>            Assignee: Alexey Serbin
>
> We currently have no test coverage of what happens if the KDC has a temporary 
> outage on a secure cluster. We should ensure that, if the KDC is down, that 
> we emit appropriate error messages to diagnose the issue and do not crash the 
> process, etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to