[ https://issues.apache.org/jira/browse/SENTRY-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Na Li updated SENTRY-2539: -------------------------- Description: Right now, for a command such "show databases", Sentry has to perform authorization checks on each database. When there are many databases, like 12000 databases in the system, the authorization checks of a single command in Sentry could be very slow. There are two main factors that slow down authorization checks in Sentry even when caching is enabled: 1) Cache returns the list of privileges in the form of String. As a result, every authorization check has to convert the privilege string to privilege object. 2) When cache is enabled, the cache returns all privileges of a given user regardless what resource to check. 2.1) for example, a user has 2000 privileges assigned and the resource to check is "server=server1, database=db_1, table=table_1". The cache returns all 2000 privileges including unrelated privileges such like "server=server1->database=db_2->action=ALL". 2.2) Returning unrelated privileges has two side effects: 2.2.1) Converting privileges from String to Object overhead is proportional to the number of returned privileges from cache. Converting unrelated privileges cost time, but no benefit. 2.2.2) Authorization check goes through each privilege, and its overhead is proportional to the number of returned privileges from cache. Converting unrelated privileges cost time, but no benefit. Solution: 1) Add a new function listPrivilegeObjects that lets authorization provider get privilege objects when checking the authorization. This avoids the conversion overhead. All the interfaces from policy engine (PolicyEngine) to the cache (PrivilegeCache) have to be changed to add this new function. 2) Implement a new cache TreePrivilegeCache. It converts the privilege from String format to Privilege object at beginning, and directly return the privilege objects in listPrivilegeObjects at authorization check. This avoids the overhead of conversion at each authorization check. 3) TreePrivilegeCache organizes the privileges based on the resource hierarchy, like a tree. Therefore, it can return only related privileges based on the resource to check. This reduces the authorization check overhead. 3.1) For example, a user has 2000 privileges assigned, and the resource to check is "server=server1, database=db_1, table=table_1". the cache TreePrivilegeCache returns only related privileges excluding unrelated privileges such like "server=server1->database=db_2->action=ALL". 3.2) SENTRY-1291 was to address the problem 2). However, it did not address the problem 1). And its implementation SimplePrivilegeCache is not memory efficient (the key of the map contains the whole resource hierarchy, and many keys share large portion of the same content), nor operational efficient (for each authorization check, SimplePrivilegeCache .listPrivileges() has to construct a large amount of keys in order to find all related privileges in a map). 4) Use TreePrivilegeCache instead of SimplePrivilegeCache for caching. Note, this solution is built on top of SENTRY-1291, and utilizes the changes SENTRY-1291 made, such as providing resource hierarchy when getting privileges for authorization check. was: Right now, the PolicyEngine Interface only returns the list of privileges in the form of String. As a result, every authorization check has to convert the privilege string to privilege object even though the cached privilege objects are of the correct type already. We should add a new function that returns privilege object directly to avoid the overhead of conversion. > PolicyEngine should be able to return privilege directly > --------------------------------------------------------- > > Key: SENTRY-2539 > URL: https://issues.apache.org/jira/browse/SENTRY-2539 > Project: Sentry > Issue Type: Improvement > Components: Sentry > Affects Versions: 2.1 > Reporter: Na Li > Assignee: Na Li > Priority: Major > Attachments: SENTRY-2539.002.patch, SENTRY-2539.003.patch, > SENTRY-2539.005.patch, SENTRY-2539.006.patch, SENTRY-2539.007.patch, > SENTRY-2539.008.patch, SENTRY-2539.008.patch, SENTRY-2539.008.patch > > > Right now, for a command such "show databases", Sentry has to perform > authorization checks on each database. When there are many databases, like > 12000 databases in the system, the authorization checks of a single command > in Sentry could be very slow. There are two main factors that slow down > authorization checks in Sentry even when caching is enabled: > 1) Cache returns the list of privileges in the form of String. As a result, > every authorization check has to convert the privilege string to privilege > object. > 2) When cache is enabled, the cache returns all privileges of a given user > regardless what resource to check. > 2.1) for example, a user has 2000 privileges assigned and the resource to > check is "server=server1, database=db_1, table=table_1". The cache returns > all 2000 privileges including unrelated privileges such like > "server=server1->database=db_2->action=ALL". > 2.2) Returning unrelated privileges has two side effects: > 2.2.1) Converting privileges from String to Object overhead is > proportional to the number of returned privileges from cache. Converting > unrelated privileges cost time, but no benefit. > 2.2.2) Authorization check goes through each privilege, and its overhead > is proportional to the number of returned privileges from cache. Converting > unrelated privileges cost time, but no benefit. > Solution: > 1) Add a new function listPrivilegeObjects that lets authorization provider > get privilege objects when checking the authorization. This avoids the > conversion overhead. All the interfaces from policy engine (PolicyEngine) to > the cache (PrivilegeCache) have to be changed to add this new function. > 2) Implement a new cache TreePrivilegeCache. It converts the privilege from > String format to Privilege object at beginning, and directly return the > privilege objects in listPrivilegeObjects at authorization check. This avoids > the overhead of conversion at each authorization check. > 3) TreePrivilegeCache organizes the privileges based on the resource > hierarchy, like a tree. Therefore, it can return only related privileges > based on the resource to check. This reduces the authorization check > overhead. > 3.1) For example, a user has 2000 privileges assigned, and the resource to > check is "server=server1, database=db_1, table=table_1". the cache > TreePrivilegeCache returns only related privileges excluding unrelated > privileges such like "server=server1->database=db_2->action=ALL". > 3.2) SENTRY-1291 was to address the problem 2). However, it did not address > the problem 1). And its implementation SimplePrivilegeCache is not memory > efficient (the key of the map contains the whole resource hierarchy, and many > keys share large portion of the same content), nor operational efficient (for > each authorization check, SimplePrivilegeCache .listPrivileges() has to > construct a large amount of keys in order to find all related privileges in a > map). > 4) Use TreePrivilegeCache instead of SimplePrivilegeCache for caching. Note, > this solution is built on top of SENTRY-1291, and utilizes the changes > SENTRY-1291 made, such as providing resource hierarchy when getting > privileges for authorization check. -- This message was sent by Atlassian Jira (v8.3.4#803005)