henrib opened a new pull request, #4601:
URL: https://github.com/apache/hive/pull/4601

   
   ### What changes were proposed in this pull request?
   When filtering tables (or table names) through applicable privileges, the 
current code uses a cartesian product (table to verify * accessible tables). 
When the number of tables is important (200k) and all (or most) tables are 
accessible, calling these filters takes a very (very) long time.
   
   This PR improves the current filtering algorithm by first sorting the list 
of accessible tables (through a TableIndex derived class) and then looping 
around the tables to filter using a binary-search lookup in that index. 
Complexity goes from m*n to (m+n)*log(n) improving the performance to 
acceptable levels. 
   
   ### Why are the changes needed?
   Unacceptable performance for large sets
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### Is the change a dependency upgrade?
   No
   
   
   ### How was this patch tested?
   Unit tests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to