[ https://issues.apache.org/jira/browse/SENTRY-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101535#comment-15101535 ]
Dapeng Sun commented on SENTRY-1007: ------------------------------------ Link for RB https://reviews.apache.org/r/42344/ > Sentry column-level performance for wide tables > ----------------------------------------------- > > Key: SENTRY-1007 > URL: https://issues.apache.org/jira/browse/SENTRY-1007 > Project: Sentry > Issue Type: Bug > Affects Versions: 1.5.1 > Reporter: Anne Yu > Assignee: Dapeng Sun > Priority: Critical > Attachments: SENTRY-1007.001.patch > > > It appears a query is taking a long time on a wide table due many Sentry > column-level auth checks. > here are some investigation results: > 1) create a table with 4000 columns, grant select on [table|db] to test user, > however {noformat}select * from table{noformat} still validates column level > privilege on each column. So this select command issues 4000 queries to > validate column level permissions. It takes 40 seconds on my test cluster (2x > large) to return results, for a moment, query seems to freeze: > {noformat} > ... > 2016-01-11 11:54:14,816 INFO DataNucleus.Query: Reading in results for query > "SELECT FROM org.apache.sentry.provider.db.service.model.MSentryPrivilege > WHERE roles.contains(role) && (role.roleName == "test_role") && serverName == > "server1" && ((dbName == "test_db") || (dbName == "__NULL__")) && (URI == > "__NULL__") && ((tableName == "test_tb") || (tableName == "__NULL__")) && > (URI == "__NULL__") && ((columnName == "test997") || (columnName == > "__NULL__")) && (URI == "__NULL__") VARIABLES > org.apache.sentry.provider.db.service.model.MSentryRole role" since the > connection used is closing > 2016-01-11 11:54:14,822 INFO DataNucleus.Query: Reading in results for query > "SELECT FROM org.apache.sentry.provider.db.service.model.MSentryPrivilege > WHERE roles.contains(role) && (role.roleName == "test_role") && serverName == > "server1" && ((dbName == "test_db") || (dbName == "__NULL__")) && (URI == > "__NULL__") && ((tableName == "test_tb") || (tableName == "__NULL__")) && > (URI == "__NULL__") && ((columnName == "test998") || (columnName == > "__NULL__")) && (URI == "__NULL__") VARIABLES > org.apache.sentry.provider.db.service.model.MSentryRole role" since the > connection used is closing > 2016-01-11 11:54:14,828 INFO DataNucleus.Query: Reading in results for query > "SELECT FROM org.apache.sentry.provider.db.service.model.MSentryPrivilege > WHERE roles.contains(role) && (role.roleName == "test_role") && serverName == > "server1" && ((dbName == "test_db") || (dbName == "__NULL__")) && (URI == > "__NULL__") && ((tableName == "test_tb") || (tableName == "__NULL__")) && > (URI == "__NULL__") && ((columnName == "test999") || (columnName == > "__NULL__")) && (URI == "__NULL__") VARIABLES > org.apache.sentry.provider.db.service.model.MSentryRole role" since the > connection used is closing > {noformat} > Here is the debug log from sentry service: > {noformat} > org.apache.sentry.binding.hive.authz.HiveAuthzBinding.authorize(HiveAuthzBinding.java:304)] > requiredInputPrivileges = {Table=[SELECT], Column=[SELECT], URI=[ALL]} > {noformat} > 2) the same issue for {noformat}show columns in table{noformat} > 3) for {noformat}show count(*) in table{noformat}, it requires table level > privilege, so this issue doesn't exist. > 4) I found out there are more commands have the same issues are: > {code} > SHOW COLUMNS FROM test_tb1; > create table test_tb2 as select * from test_tb1 > show partitions in test_tb1 > select * from test_tb1 > {code} > Even for {code}select col1,col2,col3 from test_tb1{code} will issue 3 queries > for each column, instead of one query for all columns in one table; -- This message was sent by Atlassian JIRA (v6.3.4#6332)