[ 
https://issues.apache.org/jira/browse/IMPALA-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Behm resolved IMPALA-5152.
------------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.12.0

commit 8ea1ce87e2150c843b4da15f9d42b87006e6ffca
Author: Alex Behm <alex.b...@cloudera.com>
Date:   Fri Apr 7 09:58:40 2017 -0700

    IMPALA-5152: Introduce metadata loading phase
    
    Reworks the collection and loading of missing metadata
    when compiling a statement. Introduces a new
    metadata-loading phase between parsing and analysis.
    Summary of the new compilation flow:
    1. Parse statement.
    2. Collect all table references from the parsed
       statement and generate a list of tables that need
       to be loaded for analysis to succeed.
    3. Request missing metadata and wait for it to arrive.
       As views become loaded we expand the set of required
       tables based on the view definitions.
       This step populates a statement-local table cache
       that contains all loaded tables relevant to the
       statement.
    4. Create a new Analyzer with the table cache and
       analyze the statement. During analysis only the
       table cache is consulted for table metadata, the
       ImpaladCatalog is not used for that purpose anymore.
    5. Authorize the statement.
    6. Plan generation as usual.
    
    The intent of the existing code was to collect all tables
    missing metadata during analysis, load the metadata, and then
    re-analyze the statement (and repeat those steps until all
    metadata is loaded).
    Unfortunately, the relevant code was hard-to-follow, subtle
    and not well tested, and therefore it was broken in several
    ways over the course of time. For example, the introduction
    of path analysis for nested types subtly broke the intended
    behavior, and there are other similar examples.
    
    The serial table loading observed in the JIRA was caused by the
    following code in the resolution of table references:
    for (all path interpretations) {
      try {
        // Try to resolve the path; might call getTable() which
        // throws for nonexistent tables.
      } catch (AnalysisException e) {
        if (analyzer.hasMissingTbls()) throw e;
      }
    }
    
    The following example illustrates the problem:
    SELECT * FROM a.b, x.y
    When resolving the path "a.b" we consider that "a" could be a
    database or a table. Similarly, "b" could be a table or a
    nested collection.
    If the path resolution for "a.b" adds a missing table entry,
    then the path resolution for "x.y" could exit prematurely,
    without trying the other path interpretations that would
    lead to adding the expected missing table. So effectively,
    the tables end up being loaded one-by-one.
    
    Testing:
    - A core/hdfs run succeeded
    - No new tests were added because the existing functional tests
      provide good coverage of various metadata loading scenarios.
    - The issue reported in IMPALA-5152 is basically impossible now.
      Adding FE unit tests for that bug specifically would require
      ugly changes to the new code to enable such testing.
    
    Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55
    Reviewed-on: http://gerrit.cloudera.org:8080/8958
    Reviewed-by: Alex Behm <alex.b...@cloudera.com>
    Tested-by: Impala Public Jenkins


> Frontend requests metadata for one table at a time in the query 
> ----------------------------------------------------------------
>
>                 Key: IMPALA-5152
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5152
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog, Frontend
>    Affects Versions: Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, Impala 2.11.0
>            Reporter: Mostafa Mokhtar
>            Assignee: Alexander Behm
>            Priority: Critical
>              Labels: Performance, frontend
>             Fix For: Impala 2.12.0
>
>
> It appears that the Frontend serializes loading metadata for missing tables 
> in a query, Catalog log shows that the queue size is alway 0. 
> Query below references  9 tables and metadata is loaded for one table at a 
> time. 
> {code}
> explain select i_item_id ,i_item_desc ,s_state ,count(ss_quantity) as 
> store_sales_quantitycount ,avg(ss_quantity) as store_sales_quantityave 
> ,stddev_samp(ss_quantity) as store_sales_quantitystdev 
> ,stddev_samp(ss_quantity)/avg(ss_quantity) as store_sales_quantitycov 
> ,count(sr_return_quantity) as store_returns_quantitycount 
> ,avg(sr_return_quantity) as store_returns_quantityave 
> ,stddev_samp(sr_return_quantity) as store_returns_quantitystdev 
> ,stddev_samp(sr_return_quantity)/avg(sr_return_quantity) as 
> store_returns_quantitycov ,count(cs_quantity) as catalog_sales_quantitycount 
> ,avg(cs_quantity) as catalog_sales_quantityave ,stddev_samp(cs_quantity) as 
> catalog_sales_quantitystdev ,stddev_samp(cs_quantity)/avg(cs_quantity) as 
> catalog_sales_quantitycov from store_sales ,store_returns ,catalog_sales 
> ,date_dim d1 ,date_dim d2 ,date_dim d3 ,store ,item where d1.d_quarter_name = 
> '2000Q1' and d1.d_date_sk = ss_sold_date_sk and i_item_sk = ss_item_sk and 
> s_store_sk = ss_store_sk and ss_customer_sk = sr_customer_sk and ss_item_sk = 
> sr_item_sk and ss_ticket_number = sr_ticket_number and sr_returned_date_sk = 
> d2.d_date_sk and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3') and 
> sr_customer_sk = cs_bill_customer_sk and sr_item_sk = cs_item_sk and 
> cs_sold_date_sk = d3.d_date_sk and d3.d_quarter_name in 
> ('2000Q1','2000Q2','2000Q3') group by i_item_id ,i_item_desc ,s_state order 
> by i_item_id ,i_item_desc ,s_state limit 100
> {code}
> Catalog log
> {code}
> I0403 14:17:32.471273 57286 TableLoadingMgr.java:285] Loading next table from 
> queue: tpcds_1000_parquet.store_sales
> I0403 14:17:32.471375 57286 TableLoadingMgr.java:287] Remaining items in 
> queue: 0. Loads in progress: 0
> I0403 14:17:32.471560 34156 TableLoader.java:58] Loading metadata for: 
> tpcds_1000_parquet.store_sales
> I0403 14:17:32.485390 34156 HdfsTable.java:1145] Fetching partition metadata 
> from the Metastore: tpcds_1000_parquet.store_sales
> I0403 14:17:32.760711 34156 HdfsTable.java:1149] Fetched partition metadata 
> from the Metastore: tpcds_1000_parquet.store_sales
> I0403 14:17:33.958519 34156 HdfsTable.java:844] Loading file and block 
> metadata for 1824 partitions from 1 paths: tpcds_1000_parquet.store_sales
> I0403 14:17:34.392324 34156 HdfsTable.java:848] Loaded file and block 
> metadata for 1824 partitions from 1 paths: tpcds_1000_parquet.store_sales
> I0403 14:17:34.392421 34156 TableLoader.java:97] Loaded metadata for: 
> tpcds_1000_parquet.store_sales
> I0403 14:17:36.058523 57304 catalog-server.cc:320] Publishing update: 
> TABLE:tpcds_1000_parquet.store_sales@3840
> I0403 14:17:36.065404 57304 catalog-server.cc:320] Publishing update: 
> CATALOG:44dafc1672d34719:bf64b7285d2a5912@3840
> I0403 14:17:38.279191 57271 TableLoadingMgr.java:285] Loading next table from 
> queue: tpcds_1000_parquet.store_returns
> I0403 14:17:38.279278 57271 TableLoadingMgr.java:287] Remaining items in 
> queue: 0. Loads in progress: 0
> I0403 14:17:38.279422 34244 TableLoader.java:58] Loading metadata for: 
> tpcds_1000_parquet.store_returns
> I0403 14:17:38.308568 34244 HdfsTable.java:1145] Fetching partition metadata 
> from the Metastore: tpcds_1000_parquet.store_returns
> I0403 14:17:38.579197 34244 HdfsTable.java:1149] Fetched partition metadata 
> from the Metastore: tpcds_1000_parquet.store_returns
> I0403 14:17:39.897581 34244 HdfsTable.java:844] Loading file and block 
> metadata for 2004 partitions from 1 paths: tpcds_1000_parquet.store_returns
> I0403 14:17:40.371350 34244 HdfsTable.java:848] Loaded file and block 
> metadata for 2004 partitions from 1 paths: tpcds_1000_parquet.store_returns
> I0403 14:17:40.371443 34244 TableLoader.java:97] Loaded metadata for: 
> tpcds_1000_parquet.store_returns
> I0403 14:17:42.088232 57304 catalog-server.cc:320] Publishing update: 
> TABLE:tpcds_1000_parquet.store_returns@3841
> I0403 14:17:42.092733 57304 catalog-server.cc:320] Publishing update: 
> CATALOG:44dafc1672d34719:bf64b7285d2a5912@3841
> I0403 14:17:44.361759 57273 TableLoadingMgr.java:285] Loading next table from 
> queue: tpcds_1000_parquet.catalog_sales
> I0403 14:17:44.361835 57273 TableLoadingMgr.java:287] Remaining items in 
> queue: 0. Loads in progress: 0
> I0403 14:17:44.362061 34289 TableLoader.java:58] Loading metadata for: 
> tpcds_1000_parquet.catalog_sales
> I0403 14:17:44.377027 34289 HdfsTable.java:1145] Fetching partition metadata 
> from the Metastore: tpcds_1000_parquet.catalog_sales
> I0403 14:17:44.650100 34289 HdfsTable.java:1149] Fetched partition metadata 
> from the Metastore: tpcds_1000_parquet.catalog_sales
> I0403 14:17:45.819257 34289 HdfsTable.java:844] Loading file and block 
> metadata for 1837 partitions from 1 paths: tpcds_1000_parquet.catalog_sales
> I0403 14:17:46.264878 34289 HdfsTable.java:848] Loaded file and block 
> metadata for 1837 partitions from 1 paths: tpcds_1000_parquet.catalog_sales
> I0403 14:17:46.264987 34289 TableLoader.java:97] Loaded metadata for: 
> tpcds_1000_parquet.catalog_sales
> I0403 14:17:48.093703 57304 catalog-server.cc:320] Publishing update: 
> TABLE:tpcds_1000_parquet.catalog_sales@3842
> I0403 14:17:48.098681 57304 catalog-server.cc:320] Publishing update: 
> CATALOG:44dafc1672d34719:bf64b7285d2a5912@3842
> I0403 14:17:50.438555 57272 TableLoadingMgr.java:285] Loading next table from 
> queue: tpcds_1000_parquet.date_dim
> I0403 14:17:50.438663 57272 TableLoadingMgr.java:287] Remaining items in 
> queue: 0. Loads in progress: 0
> I0403 14:17:50.438886 34319 TableLoader.java:58] Loading metadata for: 
> tpcds_1000_parquet.date_dim
> I0403 14:17:50.454288 34319 HdfsTable.java:1145] Fetching partition metadata 
> from the Metastore: tpcds_1000_parquet.date_dim
> I0403 14:17:50.455581 34319 HdfsTable.java:1149] Fetched partition metadata 
> from the Metastore: tpcds_1000_parquet.date_dim
> I0403 14:17:50.458034 34319 HdfsTable.java:844] Loading file and block 
> metadata for 1 partitions from 1 paths: tpcds_1000_parquet.date_dim
> I0403 14:17:50.458940 34319 HdfsTable.java:848] Loaded file and block 
> metadata for 1 partitions from 1 paths: tpcds_1000_parquet.date_dim
> I0403 14:17:50.459019 34319 TableLoader.java:97] Loaded metadata for: 
> tpcds_1000_parquet.date_dim
> I0403 14:17:52.067752 57304 catalog-server.cc:320] Publishing update: 
> TABLE:tpcds_1000_parquet.date_dim@3843
> I0403 14:17:52.068792 57304 catalog-server.cc:320] Publishing update: 
> CATALOG:44dafc1672d34719:bf64b7285d2a5912@3843
> I0403 14:17:54.451196 57276 TableLoadingMgr.java:285] Loading next table from 
> queue: tpcds_1000_parquet.store
> I0403 14:17:54.451275 57276 TableLoadingMgr.java:287] Remaining items in 
> queue: 0. Loads in progress: 0
> I0403 14:17:54.451402 34392 TableLoader.java:58] Loading metadata for: 
> tpcds_1000_parquet.store
> I0403 14:17:54.464722 34392 HdfsTable.java:1145] Fetching partition metadata 
> from the Metastore: tpcds_1000_parquet.store
> I0403 14:17:54.466107 34392 HdfsTable.java:1149] Fetched partition metadata 
> from the Metastore: tpcds_1000_parquet.store
> I0403 14:17:54.468161 34392 HdfsTable.java:844] Loading file and block 
> metadata for 1 partitions from 1 paths: tpcds_1000_parquet.store
> I0403 14:17:54.468992 34392 HdfsTable.java:848] Loaded file and block 
> metadata for 1 partitions from 1 paths: tpcds_1000_parquet.store
> I0403 14:17:54.469070 34392 TableLoader.java:97] Loaded metadata for: 
> tpcds_1000_parquet.store
> I0403 14:17:56.036121 57304 catalog-server.cc:320] Publishing update: 
> TABLE:tpcds_1000_parquet.store@3844
> I0403 14:17:56.037204 57304 catalog-server.cc:320] Publishing update: 
> CATALOG:44dafc1672d34719:bf64b7285d2a5912@3844
> I0403 14:17:58.457381 57274 TableLoadingMgr.java:285] Loading next table from 
> queue: tpcds_1000_parquet.item
> I0403 14:17:58.457473 57274 TableLoadingMgr.java:287] Remaining items in 
> queue: 0. Loads in progress: 0
> I0403 14:17:58.457653 34456 TableLoader.java:58] Loading metadata for: 
> tpcds_1000_parquet.item
> I0403 14:17:58.470528 34456 HdfsTable.java:1145] Fetching partition metadata 
> from the Metastore: tpcds_1000_parquet.item
> I0403 14:17:58.471864 34456 HdfsTable.java:1149] Fetched partition metadata 
> from the Metastore: tpcds_1000_parquet.item
> I0403 14:17:58.474072 34456 HdfsTable.java:844] Loading file and block 
> metadata for 1 partitions from 1 paths: tpcds_1000_parquet.item
> I0403 14:17:58.474925 34456 HdfsTable.java:848] Loaded file and block 
> metadata for 1 partitions from 1 paths: tpcds_1000_parquet.item
> I0403 14:17:58.475021 34456 TableLoader.java:97] Loaded metadata for: 
> tpcds_1000_parquet.item
> I0403 14:18:00.036249 57304 catalog-server.cc:320] Publishing update: 
> TABLE:tpcds_1000_parquet.item@3845
> I0403 14:18:00.037330 57304 catalog-server.cc:320] Publishing update: 
> CATALOG:44dafc1672d34719:bf64b7285d2a5912@3845
> {code}
> Coordinator node log
> {code}
> I0403 14:17:32.471491 37742 Frontend.java:833] Requesting prioritized load of 
> table(s): tpcds_1000_parquet.store_sales
> I0403 14:17:38.279330 37742 Frontend.java:833] Requesting prioritized load of 
> table(s): tpcds_1000_parquet.store_returns
> I0403 14:17:44.361925 37742 Frontend.java:833] Requesting prioritized load of 
> table(s): tpcds_1000_parquet.catalog_sales
> I0403 14:17:50.438707 37742 Frontend.java:833] Requesting prioritized load of 
> table(s): tpcds_1000_parquet.date_dim
> I0403 14:17:54.451408 37742 Frontend.java:833] Requesting prioritized load of 
> table(s): tpcds_1000_parquet.store
> I0403 14:17:58.457484 37742 Frontend.java:833] Requesting prioritized load of 
> table(s): tpcds_1000_parquet.item
> I0403 14:18:02.465189 37742 Frontend.java:928] Compiled query.
> I0403 14:18:02.593619 37742 impala-beeswax-server.cc:190] 
> get_results_metadata(): query_id=664050071b49c3c8:10b184b900000000
> I0403 14:18:02.618315 37742 impala-beeswax-server.cc:233] close(): 
> query_id=664050071b49c3c8:10b184b900000000
> I0403 14:18:02.618413 37742 impala-server.cc:921] UnregisterQuery(): 
> query_id=664050071b49c3c8:10b184b900000000
> I0403 14:18:02.618422 37742 impala-server.cc:1007] Cancel(): 
> query_id=664050071b49c3c8:10b184b900000000
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to