Tushar I created SENTRY-1703:
--------------------------------

             Summary: Solr-Sentry in kerberos mode makes too many KDC requests 
and returns unauthorized on KDC timeout
                 Key: SENTRY-1703
                 URL: https://issues.apache.org/jira/browse/SENTRY-1703
             Project: Sentry
          Issue Type: Bug
          Components: Solr Plugin
    Affects Versions: 1.5.1
            Reporter: Tushar I


Sentry Version: 1.5.1-cdh5.8.0

We are seeing intermittent authorization failures with Sentry Solr plugin in a 
Kerberos environment.

1. We are writing to Solr using the SolrJ client from within Spark jobs in a 
multi-node Spark/Hadoop cluster and frequently get authorization errors from 
Solr in individual spark tasks saying "User XX does not have privileges for 
YYcollection" which are generated by the Solr-Sentry plugin. (The user very 
well has access to the collection and it works fine rest of the times).
2. The root cause seems to be that on every Solr call from the client, Sentry 
reaches out to KDC on behalf of solr/hostname to check if user XX has 
permission on the YYcollection, thereby drowning the KDC in tons of requests 
per second, and at some point fails on a KDC timeout, throwing the exception: 
{{org.apache.sentry.binding.solr.authz.SentrySolrAuthorizationException: User 
XX does not have privileges for YYcollection}} to the calling client.

I didn't get enough time to investigate why Sentry is making so many KDC calls, 
maybe it's doing it for each document in a batched Solr operation, or it logs 
in using keytab each time and doesn't cache the ticket, etc.

Caching the result of {{authProvider.hasAccess()}} in SolrAuthzBinding.java for 
a reasonably short time might not be a bad idea.

My question in the meantime is: Are there any tuning knobs to somehow reduce 
the load on KDC, or increase the KDC request timeout value, or anything along 
these lines?

Relevant stacktraces captured from Solr Admin are attached:
1. stacktrace1.log : The timeout from KDC for sentry call
2. stacktrace2.log: When Sentry cannot authenticate with KDC due to # 1 above
3. stacktrace3.log: SolrException when {{authProvider.hasAccess()}} returns 
false due to # 2 above.

Also attached is a _snippet_ from the KDC log - the full log bloats to 17 MB 
within a minute, full of messages like: 
{code}
Apr 10 17:06:37 a0 krb5kdc[20427](info): TGS_REQ (1 etypes {23}) 10.0.0.1: 
ISSUE: authtime 1491818430, etypes {rep=23 tkt=23 ses=23}, solr/[email protected] 
for sentry/[email protected]
{code}

This is reproducible in two separate clusters with different environments:
CDH 5.10.1 and
CDH 5.8.0

Please let me know if I've left out any key information.





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to