[ 
https://issues.apache.org/jira/browse/IMPALA-9002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952137#comment-16952137
 ] 

ASF subversion and git services commented on IMPALA-9002:
---------------------------------------------------------

Commit 63a1d210d3476fa6f673c640bb26cd96c835c641 in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=63a1d21 ]

IMPALA-9002: Add query option to only check SELECT privilege in SHOW TABLES

If authorization is enabled, SHOW TABLES statements or GET_TABLES
requests in HS2 protocol (used in HUE and JDBC drivers) will only return
tables that the user has ANY privileges on them. If the user don't have
any privileges on a table, we need 8 privilege checks (ALL, INSERT,
SELECT, ALTER, CREATE, DROP, OWNER, REFRESH) to get this conclusion.
It takes time in Sentry to check these one by one if there are thousands
of tables. Unfortunately, there are no batch API for these checks. This
introduces a performance regression after we supported fine-grained
privileges, since before that we just check 3 privileges (ALL, INSERT,
SELECT).

In practice, SELECT privilege is the minimal privilege set. It's wired
to grant INSERT or other privileges to a resource without SELECT
privilege. We can simplify the process to only check on SELECT privilege
if users make sure that SELECT privilege is the minimal privilege set in
their environment. This patch adds a flag(SIMPLIFY_CHECK_ON_SHOW_TABLES)
to bypass checking other privileges in SHOW TABLE statements.

Testing in a database with 40k tables and granting the user SELECT
privilege on only 6 tables. When using Sentry, the SHOW TABLES statement
takes 5s. With the SIMPLIFY_CHECK_ON_SHOW_TABLES enabled, time reduces
to 1.2s. No performance gain is observed when using Ranger since Ranger
is fast enough.

Tests:
 - Add custom cluster test for the flag in test_authorization.py for
 both Sentry and Ranger.
 - Run CORE tests

Change-Id: I17e2b7bf9e36c54627276a6812b459912156cc3c
Reviewed-on: http://gerrit.cloudera.org:8080/14400
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Add flag to only check SELECT priviledge in GET_TABLES
> ------------------------------------------------------
>
>                 Key: IMPALA-9002
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9002
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Security
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Major
>
> In Frontend.doGetTableNames(), if authorization is enabled, we only return 
> tables that current user has ANY priviledge on them:
> {code:java}
>   private List<String> doGetTableNames(String dbName, PatternMatcher matcher,
>       User user) throws ImpalaException {
>     FeCatalog catalog = getCatalog();
>     List<String> tblNames = catalog.getTableNames(dbName, matcher);
>     if (authzFactory_.getAuthorizationConfig().isEnabled()) {
>       Iterator<String> iter = tblNames.iterator();
>       while (iter.hasNext()) {
>         ......
>         PrivilegeRequest privilegeRequest = new PrivilegeRequestBuilder(
>             authzFactory_.getAuthorizableFactory())
>             .any().onAnyColumn(dbName, tblName, tableOwner).build();  <-- 
> require ANY priviledge here
>         if (!authzChecker_.get().hasAccess(user, privilegeRequest)) {
>           iter.remove();
>         }
>       }
>     }
>     return tblNames;
>   } {code}
> In Sentry integration, checking ANY priviledge will check all possible 
> priviledges, i.e. ALL, OWNER, ALTER, DROP, CREATE, INSERT, SELECT, REFRESH, 
> until one is permitted. In the worst case that current use don't have any 
> priviledge on a table, we need to perform 8 checks on this table.
> {code:java}
> public enum Privilege {
>   ...
>   static {
>     ...
>     ANY.implied_ = EnumSet.of(ALL, OWNER, ALTER, DROP, CREATE, INSERT, SELECT,
>         REFRESH); {code}
> GET_TABLES performance is poor when there're thosands of tables. It's 
> reasonable to only return tables that current user has SELECT priviledge on 
> them. Checking only the SELECT priviledge can boost the perfomance to be 8 
> times better. In my experiment on impala-2.12-cdh5.16.2 with 40k tables, 
> GET_TABLES takes 16s originally when current user only have priviledges on 6 
> tables. With this change, time reduces to 2s.
> We can add a flag to only check on SELECT priviledge for table visuability.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to