[
https://issues.apache.org/jira/browse/KUDU-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928517#comment-15928517
]
Alexey Serbin commented on KUDU-1917:
-------------------------------------
The current approach is the following: if KDC is not available when a master or
tablet server starts, and the cached tickets are expired, the process will
crash with corresponding message like:
{noformat}
F0316 10:40:09.024931 1964126208 master_main.cc:68] Check failed: _s.ok() Bad
status: Runtime error: unable to kinit: unable to login from keytab: unable to
reach any KDC in realm KRBTEST.COM, tried 1 KDC
{noformat}
{noformat}
F0316 10:40:12.902180 1964126208 tablet_server_main.cc:72] Check failed:
_s.ok() Bad status: Runtime error: unable to kinit: unable to login from
keytab: unable to reach any KDC in realm KRBTEST.COM, tried 1 KDC
{noformat}
If master/tablet server started up and running, KDC can be restarted in the
middle. The KDC can be offline even past ticket TTL -- that's fine. During
that KDC offline window, brand new client requests and requests with expired
Kerberos tickets will fails with corresponding messages containing strings like
{noformat}
Not authorized: Client connection negotiation failed: client connection to
127.0.0.1:11082: GSSAPI Error: The context has expire
{noformat}
{noformat}
Not authorized: Could not connect to the cluster: Client connection negotiation
failed: client connection to 127.0.0.1:11082: Ticket expired
{noformat}
However, master and tablet servers continue running with no issues.
Once KDC is back, client request can be authenticated given Kerberos
credentials, and the system continue working as expected.
> Add tests which check for KDC fault conditions
> ----------------------------------------------
>
> Key: KUDU-1917
> URL: https://issues.apache.org/jira/browse/KUDU-1917
> Project: Kudu
> Issue Type: Test
> Components: security, test
> Affects Versions: 1.3.0
> Reporter: Todd Lipcon
> Assignee: Alexey Serbin
>
> We currently have no test coverage of what happens if the KDC has a temporary
> outage on a secure cluster. We should ensure that, if the KDC is down, that
> we emit appropriate error messages to diagnose the issue and do not crash the
> process, etc.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)