[
https://issues.apache.org/jira/browse/DRILL-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dechang Gu closed DRILL-4127.
-----------------------------
verified with perf test framework.
without the patch (commit id: 539cbba):
91_539cbba_HIVE_20160720_113024/HIVE_limit1_02/HIVE_limit1_02.log:[STAT] TOTAL
TIME : 126599 msec
91_539cbba_HIVE_20160720_113024/HIVE_limit1_02/HIVE_limit1_02.log:[STAT] TOTAL
TIME : 165969 msec
91_539cbba_HIVE_20160720_113024/HIVE_limit1_02/HIVE_limit1_02.log:[STAT] TOTAL
TIME : 163977 msec
with the patch (Apache Drill 1.5.0 GA, commit id: 3f228d3), the same query:
95_3f228d3_HIVE_20160721_130712/HIVE_limit1_02/HIVE_limit1_02.log:[STAT] TOTAL
TIME : 1664 msec
95_3f228d3_HIVE_20160721_130712/HIVE_limit1_02/HIVE_limit1_02.log:[STAT] TOTAL
TIME : 157 msec
95_3f228d3_HIVE_20160721_130712/HIVE_limit1_02/HIVE_limit1_02.log:[STAT] TOTAL
TIME : 167 msec
So, LGTM.
> HiveSchema.getSubSchema() should use lazy loading of all the table names
> ------------------------------------------------------------------------
>
> Key: DRILL-4127
> URL: https://issues.apache.org/jira/browse/DRILL-4127
> Project: Apache Drill
> Issue Type: Bug
> Reporter: Jinfeng Ni
> Assignee: Jinfeng Ni
> Fix For: 1.5.0
>
>
> Currently, HiveSchema.getSubSchema() will pre-load all the table names when
> it constructs the subschema, even though those tables names are not requested
> at all. This could cause considerably big performance overhead, especially
> when the hive schema contains large # of objects (thousands of tables/views
> are not un-common in some use case).
> In stead, we should change the loading of table names to on-demand. Only when
> there is a request of get all table names, we load them into hive schema.
> This should help "show schemas", since it only requires the schema name, not
> the table names in the schema.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)