Hi Donghoo,

For the first experiment, you cached all privileges to local memory, and do the 
authorization with the cache. But the cache won't refresh. I think it's a temp 
cache to resolve the performance problem. For Sentry, we should have an overall 
solution for the local cache, for example, what's the policy to cache the 
privileges, get addition privileges every time or get all privileges? How to 
refresh the local cache, pull or push, etc.
Feel free to discuss.

Best regards,

Colin Ma(Ma Jun Jie)

-----Original Message-----
From: nazgul33 [mailto:[email protected]] 
Sent: Thursday, March 24, 2016 1:13 PM
To: [email protected]
Subject: [discuss] ResourceAuthorizationProvider.hasAccess() performance

-- resending mail to [email protected]. I accidentaly sent mail to 
[email protected] --

Hi, all

I'm Donghoon Han, a software developer.
I subscribed this mailing list to discuss performance problem of sentry.

My current job is to develop a type 3 JDBC driver for universal access to Hive, 
Impala and Phoenix, which has it's own sentry enforcing logic.
We have 1000 users and 10k tables in a single hive metastore. (Huge!)

One problematic requirement is to implement JDBC api metadata.getTables() which 
only lists tables connected user has access.

To evaluate all tables with 1000 public tables out of 10k tables, with 1000 
public users, current logic requires about 200 seconds.

I found that single hasAccess() call
- lists rules in groups where the user belongs : getPrivileges()
- iterate through the rules and test requested privilege.
- while iterating, text based rules are converted to Privilege object and 
discarded.

I thought the performance problem lies in converting text rule to Privilege 
object everytime.
So I wrote some code to experiment.

= first experiment =
1. get groups
2. list rules in groups and convert them to Privilege objects 3. put it in 
HashMap<string, List<Privilege>>, where key is name of group.
4. next evaluation will use Privilege objects which are cached in HashMap.

this decreased full evaluation time from 200seconds to 10seconds.

= second experiment =
- most users are public users, with SELECT only.
- current EnumSet.allOf(DBModelAction.class) has ordering of INSERT, SELECT, 
ALL so I changed evaluation order to SELECT, INSERT, ALL.

10 seconds -> 5 seconds.

I believe pre-building Privilege object at loading time will greatly enhance 
SENTRY's performance.
code snippet is available here. the code is "proof of idea" code.

http://pastebin.com/S7uQnTN7

If all agree that this could be a good idea, I think I can contribute some code.

Thanks
     Donghoon Han

Reply via email to