[ 
https://issues.apache.org/jira/browse/SENTRY-565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15007456#comment-15007456
 ] 

Li Li commented on SENTRY-565:
------------------------------

[~colinma] I think it is a very good and important feature for sentry, with 
which we may improve the performance a lot.
At the same time it is also an 'error prone' feature. So maybe it is better to 
list multiple solutions with the pros and cons, and quantity results (eg. 
performance is improved X times for Y queriesPerMin). And you can invite more 
people to review and discuss it. Also adding some concurrency and scalable 
tests to verify your implementation.

Here my concern is about the refresh cache. In your current implementation, we 
only refresh cache when time is expired, it may cause outdated privilege data. 
I think for sentry, having the consistent privilege info is the most important 
thing. It is better to update the cache whenever there is an update.
And if refreshing cache takes very long time, it may affect the performance. We 
may need use a background thread to do the refresh.

Besides, [~anneyu] mentioned some concern about the implementation of cache, 
which I think is also very important: concurrency and retire cached items.



> Improvement the performance when Sentry filter the entity
> ---------------------------------------------------------
>
>                 Key: SENTRY-565
>                 URL: https://issues.apache.org/jira/browse/SENTRY-565
>             Project: Sentry
>          Issue Type: Improvement
>            Reporter: Colin Ma
>            Assignee: Colin Ma
>         Attachments: SENTRY-565.001.patch, SENTRY-565.002.patch, 
> SENTRY-565.003.patch, SENTRY-565.004.patch, SENTRY-565.005.patch
>
>
> Currently, when get the metadata from hive, eg, "show tables", "show 
> databases". Sentry will filter the result and output the authorized entities. 
> There will be many RPC calls when filtering the result. The related code is 
> in HiveAuthzBinding, for example, in filterShowTables:
> {code}
> ......
> for (String tableName : queryResult) {
>   ......
>   hiveAuthzBinding.authorize(operation, tableMetaDataPrivilege, subject, 
> inputHierarchy,
>             outputHierarchy, providedPrivileges);
>   ......
> }
> ......
> {code}
> hiveAuthzBinding.authorize will get the privileges from sentry service, if 
> there are many tables in the hive, the filtering process will spend much 
> time. Considering sentry also need to filter the column, HiveAuthzBinding 
> should be improved to reduce the number of rpc calls when doing the filter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to