[ 
https://issues.apache.org/jira/browse/SENTRY-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Na Li updated SENTRY-2539:
--------------------------
    Description: 
Right now, for a command such "show databases", Sentry has to perform 
authorization checks on each database. When there are many databases, like 
12000 databases in the system, the authorization checks of a single command in 
Sentry could be very slow. There are two main factors that slow down 
authorization checks in Sentry even when caching is enabled:

1) Cache returns the list of privileges in the form of String. As a result, 
every authorization check has to convert the privilege string to privilege 
object.

2) When cache is enabled, the cache returns all privileges of a given user 
regardless what resource to check.

  2.1) for example, a user has 2000 privileges assigned and the resource to 
check is "server=server1, database=db_1, table=table_1". The cache returns all 
2000 privileges including unrelated privileges such like 
"server=server1->database=db_2->action=ALL". 

  2.2) Returning unrelated privileges has two side effects:

    2.2.1) Converting privileges from String to Object overhead is proportional 
to the number of returned privileges from cache. Converting unrelated 
privileges cost time, but no benefit.

    2.2.2) Authorization check goes through each privilege, and its overhead is 
proportional to the number of returned privileges from cache. Converting 
unrelated privileges cost time, but no benefit.

Solution:

1) Add a new function listPrivilegeObjects that lets authorization provider get 
privilege objects when checking the authorization. This avoids the conversion 
overhead. All the interfaces from policy engine (PolicyEngine) to the cache 
(PrivilegeCache) have to be changed to add this new function. 

2) Implement a new cache TreePrivilegeCache. It converts the privilege from 
String format to Privilege object at beginning, and directly return the 
privilege objects in listPrivilegeObjects at authorization check. This avoids 
the overhead of conversion at each authorization check. 

3) TreePrivilegeCache organizes the privileges based on the resource hierarchy, 
like a tree. Therefore, it can return only related privileges based on the 
resource to check. This reduces the authorization check overhead. 

  3.1) For example, a user has 2000 privileges assigned, and the resource to 
check is "server=server1, database=db_1, table=table_1". the cache 
TreePrivilegeCache returns only related privileges excluding unrelated 
privileges such like "server=server1->database=db_2->action=ALL". 

  3.2) SENTRY-1291 was to address the problem 2). However, it did not address 
the problem 1). And its implementation SimplePrivilegeCache is not memory 
efficient (the key of the map contains the whole resource hierarchy, and many 
keys share large portion of the same content), nor operational efficient (for 
each authorization check, SimplePrivilegeCache .listPrivileges() has to 
construct a large amount of keys in order to find all related privileges in a 
map). 

4) Use TreePrivilegeCache instead of SimplePrivilegeCache for caching. Note, 
this solution is built on top of SENTRY-1291, and utilizes the changes 
SENTRY-1291 made, such as providing resource hierarchy when getting privileges 
for authorization check.

  was:
Right now, the PolicyEngine Interface only returns the list of privileges in 
the form of String. As a result, every authorization check has to convert the 
privilege string to privilege object even though the cached privilege objects 
are of the correct type already.

We should add a new function that returns privilege object directly to avoid 
the overhead of conversion.


> PolicyEngine  should be able to return privilege directly
> ---------------------------------------------------------
>
>                 Key: SENTRY-2539
>                 URL: https://issues.apache.org/jira/browse/SENTRY-2539
>             Project: Sentry
>          Issue Type: Improvement
>          Components: Sentry
>    Affects Versions: 2.1
>            Reporter: Na Li
>            Assignee: Na Li
>            Priority: Major
>         Attachments: SENTRY-2539.002.patch, SENTRY-2539.003.patch, 
> SENTRY-2539.005.patch, SENTRY-2539.006.patch, SENTRY-2539.007.patch, 
> SENTRY-2539.008.patch, SENTRY-2539.008.patch, SENTRY-2539.008.patch
>
>
> Right now, for a command such "show databases", Sentry has to perform 
> authorization checks on each database. When there are many databases, like 
> 12000 databases in the system, the authorization checks of a single command 
> in Sentry could be very slow. There are two main factors that slow down 
> authorization checks in Sentry even when caching is enabled:
> 1) Cache returns the list of privileges in the form of String. As a result, 
> every authorization check has to convert the privilege string to privilege 
> object.
> 2) When cache is enabled, the cache returns all privileges of a given user 
> regardless what resource to check.
>   2.1) for example, a user has 2000 privileges assigned and the resource to 
> check is "server=server1, database=db_1, table=table_1". The cache returns 
> all 2000 privileges including unrelated privileges such like 
> "server=server1->database=db_2->action=ALL". 
>   2.2) Returning unrelated privileges has two side effects:
>     2.2.1) Converting privileges from String to Object overhead is 
> proportional to the number of returned privileges from cache. Converting 
> unrelated privileges cost time, but no benefit.
>     2.2.2) Authorization check goes through each privilege, and its overhead 
> is proportional to the number of returned privileges from cache. Converting 
> unrelated privileges cost time, but no benefit.
> Solution:
> 1) Add a new function listPrivilegeObjects that lets authorization provider 
> get privilege objects when checking the authorization. This avoids the 
> conversion overhead. All the interfaces from policy engine (PolicyEngine) to 
> the cache (PrivilegeCache) have to be changed to add this new function. 
> 2) Implement a new cache TreePrivilegeCache. It converts the privilege from 
> String format to Privilege object at beginning, and directly return the 
> privilege objects in listPrivilegeObjects at authorization check. This avoids 
> the overhead of conversion at each authorization check. 
> 3) TreePrivilegeCache organizes the privileges based on the resource 
> hierarchy, like a tree. Therefore, it can return only related privileges 
> based on the resource to check. This reduces the authorization check 
> overhead. 
>   3.1) For example, a user has 2000 privileges assigned, and the resource to 
> check is "server=server1, database=db_1, table=table_1". the cache 
> TreePrivilegeCache returns only related privileges excluding unrelated 
> privileges such like "server=server1->database=db_2->action=ALL". 
>   3.2) SENTRY-1291 was to address the problem 2). However, it did not address 
> the problem 1). And its implementation SimplePrivilegeCache is not memory 
> efficient (the key of the map contains the whole resource hierarchy, and many 
> keys share large portion of the same content), nor operational efficient (for 
> each authorization check, SimplePrivilegeCache .listPrivileges() has to 
> construct a large amount of keys in order to find all related privileges in a 
> map). 
> 4) Use TreePrivilegeCache instead of SimplePrivilegeCache for caching. Note, 
> this solution is built on top of SENTRY-1291, and utilizes the changes 
> SENTRY-1291 made, such as providing resource hierarchy when getting 
> privileges for authorization check.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to