Hi Donghoo, For the first experiment, you cached all privileges to local memory, and do the authorization with the cache. But the cache won't refresh. I think it's a temp cache to resolve the performance problem. For Sentry, we should have an overall solution for the local cache, for example, what's the policy to cache the privileges, get addition privileges every time or get all privileges? How to refresh the local cache, pull or push, etc. Feel free to discuss.
Best regards, Colin Ma(Ma Jun Jie) -----Original Message----- From: nazgul33 [mailto:[email protected]] Sent: Thursday, March 24, 2016 1:13 PM To: [email protected] Subject: [discuss] ResourceAuthorizationProvider.hasAccess() performance -- resending mail to [email protected]. I accidentaly sent mail to [email protected] -- Hi, all I'm Donghoon Han, a software developer. I subscribed this mailing list to discuss performance problem of sentry. My current job is to develop a type 3 JDBC driver for universal access to Hive, Impala and Phoenix, which has it's own sentry enforcing logic. We have 1000 users and 10k tables in a single hive metastore. (Huge!) One problematic requirement is to implement JDBC api metadata.getTables() which only lists tables connected user has access. To evaluate all tables with 1000 public tables out of 10k tables, with 1000 public users, current logic requires about 200 seconds. I found that single hasAccess() call - lists rules in groups where the user belongs : getPrivileges() - iterate through the rules and test requested privilege. - while iterating, text based rules are converted to Privilege object and discarded. I thought the performance problem lies in converting text rule to Privilege object everytime. So I wrote some code to experiment. = first experiment = 1. get groups 2. list rules in groups and convert them to Privilege objects 3. put it in HashMap<string, List<Privilege>>, where key is name of group. 4. next evaluation will use Privilege objects which are cached in HashMap. this decreased full evaluation time from 200seconds to 10seconds. = second experiment = - most users are public users, with SELECT only. - current EnumSet.allOf(DBModelAction.class) has ordering of INSERT, SELECT, ALL so I changed evaluation order to SELECT, INSERT, ALL. 10 seconds -> 5 seconds. I believe pre-building Privilege object at loading time will greatly enhance SENTRY's performance. code snippet is available here. the code is "proof of idea" code. http://pastebin.com/S7uQnTN7 If all agree that this could be a good idea, I think I can contribute some code. Thanks Donghoon Han
