[
https://issues.apache.org/jira/browse/DRILL-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507513#comment-15507513
]
ASF GitHub Bot commented on DRILL-4826:
---------------------------------------
Github user gparai commented on a diff in the pull request:
https://github.com/apache/drill/pull/592#discussion_r79688118
--- Diff:
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java
---
@@ -78,17 +79,34 @@ public String getTypeName() {
}
@Override
- public List<Pair<String, ? extends Table>>
getTablesByNamesByBulkLoad(final List<String> tableNames) {
+ public List<Pair<String, ? extends Table>>
getTablesByNamesByBulkLoad(final List<String> tableNames, final int bulkSize) {
+ final int totalTables = tableNames.size();
final String schemaName = getName();
- final List<Pair<String, ? extends Table>> tableNameToTable =
Lists.newArrayList();
- List<org.apache.hadoop.hive.metastore.api.Table> tables;
- try {
- tables =
DrillHiveMetaStoreClient.getTableObjectsByNameHelper(mClient, schemaName,
tableNames);
- } catch (TException e) {
- logger.warn("Exception occurred while trying to list tables by names
from {}: {}", schemaName, e.getCause());
- return tableNameToTable;
+ final List<org.apache.hadoop.hive.metastore.api.Table> tables =
Lists.newArrayList();
+
+ // In each round, Drill asks for a sub-list of all the requested tables
+ for(int fromIndex = 0; fromIndex < totalTables; fromIndex += bulkSize)
{
--- End diff --
Space?
> Query against INFORMATION_SCHEMA.TABLES degrades as the number of views
> increases
> ---------------------------------------------------------------------------------
>
> Key: DRILL-4826
> URL: https://issues.apache.org/jira/browse/DRILL-4826
> Project: Apache Drill
> Issue Type: Bug
> Reporter: Parth Chandra
> Assignee: Parth Chandra
>
> Queries against INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.VIEWS slow
> down as the number of views increases.
> BI tools like Tableau issue a query like the following at connection time:
> {code}
> select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from
> INFORMATION_SCHEMA.`TABLES` WHERE TABLE_CATALOG LIKE 'DRILL' ESCAPE '\' AND
> TABLE_SCHEMA <> 'sys' AND TABLE_SCHEMA <> 'INFORMATION_SCHEMA'ORDER BY
> TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME
> {code}
> The time to query the information schema tables degrades as the number of
> views increases. On a test system:
> || Views || Time(secs) ||
> |500 | 6 |
> |1000 | 19 |
> |1500 | 33 |
> This can result in a single connection taking more than a minute to establish.
> The problem occurs because we read the view file for every view and this
> appears to take most of the time.
> Querying information_schema.tables does not, in fact, need to open the view
> file at all, it merely needs to get a listing of the view files. Eliminating
> the view file read will speed up the query tremendously.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)