xuzhou created IMPALA-9195:
------------------------------
Summary: Using multithreaded execution to accelerate ‘show
tables/databases’
Key: IMPALA-9195
URL: https://issues.apache.org/jira/browse/IMPALA-9195
Project: IMPALA
Issue Type: Improvement
Components: Frontend
Reporter: xuzhou
Impala version: 2.12
Using sentry for authentication
While users with multi group-policies(group-policy may be nested) executing
'show tables/databases',it seems to be awful with a long latency. In my case,
the database has 910 tables, the user waiting 65.886 seconds to get 160 tables.
I study the code and find that while executing Frontend.getTableNames:
for table in tables:
for action in actions(all actions defined in DBModelAction):
ResourceAuthorizationProvider.hasAccess
It seems that 'hasAccess' is responsable for bad performance while checking
users with complex group-policies.
I tried to use 16 threads in getTablesNames and it costs 4.752 seconds in my
case.
The code seems to be the same while using sentry service in the latest impala.
I'm not sure that if any promotion has been done in the latest sentry service
as I failed to migrate file-based sentry authentication to the sentry service.
I see that ranger is supported in the latest impala, does ranger have the
similar problem?
It seems 'show tables/databases' can benefit from multithreaded execution while
using sentry , is it reasonable to support such operations in query option
MT_DOP?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)