[ 
https://issues.apache.org/jira/browse/SENTRY-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tushar I updated SENTRY-1703:
-----------------------------
    Description: 
Sentry Version: 1.5.1-cdh5.8.0

We are seeing intermittent authorization failures with Sentry Solr plugin in a 
Kerberos environment.

1. We are writing to Solr using the SolrJ client from within Spark jobs in a 
multi-node Spark/Hadoop cluster and frequently get authorization errors from 
Solr in individual spark tasks saying "User XX does not have privileges for 
YYcollection" which are generated by the Solr-Sentry plugin. (The user very 
well has access to the collection and it works fine rest of the times).
2. The root cause seems to be that on every Solr call from the client, Sentry 
reaches out to KDC on behalf of solr/hostname, thereby drowning the KDC in tons 
of requests per second, and at some point fails on a KDC timeout, throwing the 
exception: 
{{org.apache.sentry.binding.solr.authz.SentrySolrAuthorizationException: User 
XX does not have privileges for YYcollection}} to the calling client.

I didn't get enough time to investigate why Sentry is making so many KDC calls, 
maybe it's doing it for each document in a batched Solr operation, or it logs 
in using keytab each time and doesn't cache the ticket, etc.

Caching the result of {{authProvider.hasAccess()}} in SolrAuthzBinding.java for 
a reasonably short time might not be a bad idea.

My question in the meantime is: Are there any tuning knobs to somehow reduce 
the load on KDC, or increase the KDC request timeout value, or anything along 
these lines?

Relevant stacktraces captured from Solr Admin are attached:
1. stacktrace1.log : The timeout from KDC for sentry call
2. stacktrace2.log: When Sentry cannot authenticate with KDC due to # 1 above
3. stacktrace3.log: SolrException when {{authProvider.hasAccess()}} returns 
false due to # 2 above.

Also attached is a _snippet_ from the KDC log - the full log bloats to 17 MB 
within a minute, full of messages like: 
{code}
Apr 10 17:06:37 a0 krb5kdc[20427](info): TGS_REQ (1 etypes {23}) 10.0.0.1: 
ISSUE: authtime 1491818430, etypes {rep=23 tkt=23 ses=23}, solr/[email protected] 
for sentry/[email protected]
{code}

This is reproducible in two separate clusters with different environments:
CDH 5.10.1 and
CDH 5.8.0

Please let me know if I've left out any key information.



  was:
Sentry Version: 1.5.1-cdh5.8.0

We are seeing intermittent authorization failures with Sentry Solr plugin in a 
Kerberos environment.

1. We are writing to Solr using the SolrJ client from within Spark jobs in a 
multi-node Spark/Hadoop cluster and frequently get authorization errors from 
Solr in individual spark tasks saying "User XX does not have privileges for 
YYcollection" which are generated by the Solr-Sentry plugin. (The user very 
well has access to the collection and it works fine rest of the times).
2. The root cause seems to be that on every Solr call from the client, Sentry 
reaches out to KDC on behalf of solr/hostname to check if user XX has 
permission on the YYcollection, thereby drowning the KDC in tons of requests 
per second, and at some point fails on a KDC timeout, throwing the exception: 
{{org.apache.sentry.binding.solr.authz.SentrySolrAuthorizationException: User 
XX does not have privileges for YYcollection}} to the calling client.

I didn't get enough time to investigate why Sentry is making so many KDC calls, 
maybe it's doing it for each document in a batched Solr operation, or it logs 
in using keytab each time and doesn't cache the ticket, etc.

Caching the result of {{authProvider.hasAccess()}} in SolrAuthzBinding.java for 
a reasonably short time might not be a bad idea.

My question in the meantime is: Are there any tuning knobs to somehow reduce 
the load on KDC, or increase the KDC request timeout value, or anything along 
these lines?

Relevant stacktraces captured from Solr Admin are attached:
1. stacktrace1.log : The timeout from KDC for sentry call
2. stacktrace2.log: When Sentry cannot authenticate with KDC due to # 1 above
3. stacktrace3.log: SolrException when {{authProvider.hasAccess()}} returns 
false due to # 2 above.

Also attached is a _snippet_ from the KDC log - the full log bloats to 17 MB 
within a minute, full of messages like: 
{code}
Apr 10 17:06:37 a0 krb5kdc[20427](info): TGS_REQ (1 etypes {23}) 10.0.0.1: 
ISSUE: authtime 1491818430, etypes {rep=23 tkt=23 ses=23}, solr/[email protected] 
for sentry/[email protected]
{code}

This is reproducible in two separate clusters with different environments:
CDH 5.10.1 and
CDH 5.8.0

Please let me know if I've left out any key information.




> Solr-Sentry in kerberos mode makes too many KDC requests and returns 
> unauthorized on KDC timeout
> ------------------------------------------------------------------------------------------------
>
>                 Key: SENTRY-1703
>                 URL: https://issues.apache.org/jira/browse/SENTRY-1703
>             Project: Sentry
>          Issue Type: Bug
>          Components: Solr Plugin
>    Affects Versions: 1.5.1
>            Reporter: Tushar I
>         Attachments: kdc.log.txt, stacktrace1.log.txt, stacktrace2.log.txt, 
> stacktrace3.log.txt
>
>
> Sentry Version: 1.5.1-cdh5.8.0
> We are seeing intermittent authorization failures with Sentry Solr plugin in 
> a Kerberos environment.
> 1. We are writing to Solr using the SolrJ client from within Spark jobs in a 
> multi-node Spark/Hadoop cluster and frequently get authorization errors from 
> Solr in individual spark tasks saying "User XX does not have privileges for 
> YYcollection" which are generated by the Solr-Sentry plugin. (The user very 
> well has access to the collection and it works fine rest of the times).
> 2. The root cause seems to be that on every Solr call from the client, Sentry 
> reaches out to KDC on behalf of solr/hostname, thereby drowning the KDC in 
> tons of requests per second, and at some point fails on a KDC timeout, 
> throwing the exception: 
> {{org.apache.sentry.binding.solr.authz.SentrySolrAuthorizationException: User 
> XX does not have privileges for YYcollection}} to the calling client.
> I didn't get enough time to investigate why Sentry is making so many KDC 
> calls, maybe it's doing it for each document in a batched Solr operation, or 
> it logs in using keytab each time and doesn't cache the ticket, etc.
> Caching the result of {{authProvider.hasAccess()}} in SolrAuthzBinding.java 
> for a reasonably short time might not be a bad idea.
> My question in the meantime is: Are there any tuning knobs to somehow reduce 
> the load on KDC, or increase the KDC request timeout value, or anything along 
> these lines?
> Relevant stacktraces captured from Solr Admin are attached:
> 1. stacktrace1.log : The timeout from KDC for sentry call
> 2. stacktrace2.log: When Sentry cannot authenticate with KDC due to # 1 above
> 3. stacktrace3.log: SolrException when {{authProvider.hasAccess()}} returns 
> false due to # 2 above.
> Also attached is a _snippet_ from the KDC log - the full log bloats to 17 MB 
> within a minute, full of messages like: 
> {code}
> Apr 10 17:06:37 a0 krb5kdc[20427](info): TGS_REQ (1 etypes {23}) 10.0.0.1: 
> ISSUE: authtime 1491818430, etypes {rep=23 tkt=23 ses=23}, 
> solr/[email protected] for sentry/[email protected]
> {code}
> This is reproducible in two separate clusters with different environments:
> CDH 5.10.1 and
> CDH 5.8.0
> Please let me know if I've left out any key information.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to