henrib opened a new pull request, #4601: URL: https://github.com/apache/hive/pull/4601
### What changes were proposed in this pull request? When filtering tables (or table names) through applicable privileges, the current code uses a cartesian product (table to verify * accessible tables). When the number of tables is important (200k) and all (or most) tables are accessible, calling these filters takes a very (very) long time. This PR improves the current filtering algorithm by first sorting the list of accessible tables (through a TableIndex derived class) and then looping around the tables to filter using a binary-search lookup in that index. Complexity goes from m*n to (m+n)*log(n) improving the performance to acceptable levels. ### Why are the changes needed? Unacceptable performance for large sets ### Does this PR introduce _any_ user-facing change? No ### Is the change a dependency upgrade? No ### How was this patch tested? Unit tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org