[ 
https://issues.apache.org/jira/browse/IMPALA-8606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112927#comment-17112927
 ] 

ASF subversion and git services commented on IMPALA-8606:
---------------------------------------------------------

Commit 2e07d0c07febf1d1ee9324708d543b792fb45b00 in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=2e07d0c ]

IMPALA-9669: Fix wrong types/comments of loaded tables/views for GET_TABLES in 
LocalCatalog

Coordinator can be in two modes: legacy mode or LocalCatalog mode.
Before IMPALA-8606, GET_TABLES required all tables to be loaded in
LocalCatalog-mode coordinator’s cache, which is a performance regression
compared to legacy mode coordinators. IMPALA-8606 changes the behavior
to only load the table names and create LocalIncompleteTable for each
table, which boosts the performance but results in all views being
returned with the default table type (TABLE). Besides this, all returned
comments are empty even if the table/view is loaded. This is a
regression since in legacy coordinators, loaded tables/views are shown
with correct table types and comments.

This is fixed by loading table types and comments along with table names
from catalogd. The cached list of table names of a DB is changed to be a
map containing the brief table metadata (name, type, comment). In case
of stale types or comments in the list, when loading the msTable of a
table, coordinator checks the type and comment and invalidates the table
list if any of them are stale.

Tests
 - Add tests in test_hs2.test_get_tables and manually test it in
   LocalCatalog mode.
 - Run CORE tests.

Change-Id: I2180c603f061838347936f718cd4a0257d82e633
Reviewed-on: http://gerrit.cloudera.org:8080/15887
Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> GET_TABLES performance in local catalog mode
> --------------------------------------------
>
>                 Key: IMPALA-8606
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8606
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>    Affects Versions: Impala 3.2.0
>            Reporter: Balazs Jeszenszky
>            Assignee: Quanlong Huang
>            Priority: Blocker
>              Labels: catalog-v2
>             Fix For: Impala 3.3.0
>
>
> With local catalog mode enabled, GET_TABLES JDBC requests will return more 
> than the always available table information. Any request for more metadata 
> about a table will trigger a full load of that table on the catalogd side, 
> meaning that GET_TABLES triggers the load of the entire catalog. Also, as far 
> as I can see, the requests for more metadata are made one table at a time. 
> Once the tables are loaded on the catalogd-side, a coordinator needs 3 
> roundtrips to the catalog to fetch all the details about a single table. My 
> test case had around 57k tables, 1700 DBs, and ~120k partitions. 
> GET_TABLES on a cold catalog takes 18 minutes. With a warm catalog, but cold 
> impalad, it still takes ~70 seconds.
> Many tools use GET_TABLES to populate dropdowns, etc. so this is bad for both 
> end user experience and catalog memory usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to