Madhan Neethiraj created RANGER-4741:
----------------------------------------

             Summary: Hive plugin optimization to avoid excessive metastore API 
calls
                 Key: RANGER-4741
                 URL: https://issues.apache.org/jira/browse/RANGER-4741
             Project: Ranger
          Issue Type: Improvement
          Components: plugins
            Reporter: Madhan Neethiraj
            Assignee: Madhan Neethiraj


Authorizing access to tables with large number of columns can take a long time, 
as shown below. Time taken to for a table with 400 columns takes about 100 
seconds.
{noformat}
0: jdbc:hive2://localhost:10000> SELECT * FROM large_tbl_4000;
...
No rows selected (98.674 seconds)


0: jdbc:hive2://localhost:10000> SELECT * FROM large_tbl_1000;
...
No rows selected (10.4 seconds)
{noformat}
 

For each column referenced in the query, Ranger Hive authorizer calls metastore 
API to obtain owner of the table. Optimizing to call the metastore API once per 
table can significantly reduce the time taken to authorize queries.

Here is the time taken to query the same tables with the Ranger Hive authorizer 
optimized to call metastore API only once per table referenced in the query:
{noformat}
0: jdbc:hive2://localhost:10000> SELECT * FROM large_tbl_4000;
...
No rows selected (1.328 seconds)


0: jdbc:hive2://localhost:10000> SELECT * FROM large_tbl_1000;
...
No rows selected (0.194 seconds)
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to