[ 
https://issues.apache.org/jira/browse/SENTRY-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101762#comment-15101762
 ] 

Hadoop QA commented on SENTRY-1007:
-----------------------------------

Here are the results of testing the latest attachment
https://issues.apache.org/jira/secure/attachment/12782514/SENTRY-1007.002.patch 
against master.

{color:green}Overall:{color} +1 all checks pass

{color:green}SUCCESS:{color} all tests passed

Console output: 
https://builds.apache.org/job/PreCommit-SENTRY-Build/1072/console

This message is automatically generated.

> Sentry column-level performance for wide tables
> -----------------------------------------------
>
>                 Key: SENTRY-1007
>                 URL: https://issues.apache.org/jira/browse/SENTRY-1007
>             Project: Sentry
>          Issue Type: Bug
>    Affects Versions: 1.5.1
>            Reporter: Anne Yu
>            Assignee: Dapeng Sun
>            Priority: Critical
>         Attachments: SENTRY-1007.002.patch
>
>
> It appears a query is taking a long time on a wide table due many Sentry 
> column-level auth checks. 
> here are some investigation results:
> 1) create a table with 4000 columns, grant select on [table|db] to test user, 
> however {noformat}select * from table{noformat} still validates column level 
> privilege on each column. So this select command issues 4000 queries to 
> validate column level permissions. It takes 40 seconds on my test cluster (2x 
> large) to return results, for a moment, query seems to freeze:
> {noformat}
> ...
> 2016-01-11 11:54:14,816 INFO DataNucleus.Query: Reading in results for query 
> "SELECT FROM org.apache.sentry.provider.db.service.model.MSentryPrivilege 
> WHERE roles.contains(role) && (role.roleName == "test_role") && serverName == 
> "server1" && ((dbName == "test_db") || (dbName == "__NULL__")) && (URI == 
> "__NULL__") && ((tableName == "test_tb") || (tableName == "__NULL__")) && 
> (URI == "__NULL__") && ((columnName == "test997") || (columnName == 
> "__NULL__")) && (URI == "__NULL__") VARIABLES 
> org.apache.sentry.provider.db.service.model.MSentryRole role" since the 
> connection used is closing
> 2016-01-11 11:54:14,822 INFO DataNucleus.Query: Reading in results for query 
> "SELECT FROM org.apache.sentry.provider.db.service.model.MSentryPrivilege 
> WHERE roles.contains(role) && (role.roleName == "test_role") && serverName == 
> "server1" && ((dbName == "test_db") || (dbName == "__NULL__")) && (URI == 
> "__NULL__") && ((tableName == "test_tb") || (tableName == "__NULL__")) && 
> (URI == "__NULL__") && ((columnName == "test998") || (columnName == 
> "__NULL__")) && (URI == "__NULL__") VARIABLES 
> org.apache.sentry.provider.db.service.model.MSentryRole role" since the 
> connection used is closing
> 2016-01-11 11:54:14,828 INFO DataNucleus.Query: Reading in results for query 
> "SELECT FROM org.apache.sentry.provider.db.service.model.MSentryPrivilege 
> WHERE roles.contains(role) && (role.roleName == "test_role") && serverName == 
> "server1" && ((dbName == "test_db") || (dbName == "__NULL__")) && (URI == 
> "__NULL__") && ((tableName == "test_tb") || (tableName == "__NULL__")) && 
> (URI == "__NULL__") && ((columnName == "test999") || (columnName == 
> "__NULL__")) && (URI == "__NULL__") VARIABLES 
> org.apache.sentry.provider.db.service.model.MSentryRole role" since the 
> connection used is closing
> {noformat}
> Here is the debug log from sentry service:
> {noformat}
> org.apache.sentry.binding.hive.authz.HiveAuthzBinding.authorize(HiveAuthzBinding.java:304)]
>  requiredInputPrivileges = {Table=[SELECT], Column=[SELECT], URI=[ALL]}
> {noformat}
> 2) the same issue for {noformat}show columns in table{noformat}
> 3) for {noformat}show count(*) in table{noformat}, it requires table level 
> privilege, so this issue doesn't exist. 
> 4) I found out there are more commands have the same issues are:
> {code}
> SHOW COLUMNS FROM test_tb1;
> create table test_tb2 as select * from test_tb1
> show partitions in test_tb1
> select * from test_tb1
> {code}
> Even for {code}select col1,col2,col3 from test_tb1{code} will issue 3 queries 
> for each column, instead of one query for all columns in one table;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to