Alexander Behm resolved IMPALA-6567.
       Resolution: Fixed
    Fix Version/s: Impala 2.12.0

commit ad91e0b04cedb84b5b08c810de4ab1a5555ef036
Author: Alex Behm <alex.b...@cloudera.com>
Date:   Thu Feb 22 21:07:27 2018 -0800

    IMPALA-6567: ResetMetadataStmt analysis should not load tables.
    This fixes a regression introduced by IMPALA-5152 where
    invalidate metadata <tbl> and refresh <tbl> accidentally
    required the target table to be loaded during analysis,
    ultimately leading to a double load in some situations
    (load during analysis, then another load during execution).
    Since the purpose of these statements is to reload
    metadata it does not make sense to require a table load
    during analysis - that load happens during execution.
    Note that REFRESH <tbl> PARTITION (<partition>) still
    requires the containing table to be loaded. This was
    the behavior before IMPALA-5152 and this patch does
    not attempt to improve that.
    - added new unit test
    - ran FE tests locally
    - validated the desired behavior by inspecting logs
      and the timeine from invalidate/refresh statements
    Change-Id: I7033781ebf27ea53cfd26ff0e4f74d4f242bd1dc
    Reviewed-on: http://gerrit.cloudera.org:8080/9418
    Tested-by: Impala Public Jenkins
    Reviewed-by: Alex Behm <alex.b...@cloudera.com>

> Functional dataload is intermittently super-slow
> ------------------------------------------------
>                 Key: IMPALA-6567
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6567
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 2.12.0
>            Reporter: Joe McDonnell
>            Assignee: Alexander Behm
>            Priority: Blocker
>             Fix For: Impala 2.12.0
> Recent GVO builds intermittently have a functional dataload of almost 2 hours 
> when it used to be ~30-35 minutes:
>  **
> {noformat}
> 02:12:15 Loading TPC-DS data (logging to 
> /home/ubuntu/Impala/logs/data_loading/load-tpcds.log)...
> 02:34:27 Loading workload 'tpch' using exploration strategy 'core' OK (Took: 
> 22 min 12 sec)
> 02:34:35 Loading workload 'tpcds' using exploration strategy 'core' OK (Took: 
> 22 min 20 sec)
> 04:11:40 Loading workload 'functional-query' using exploration strategy 
> 'exhaustive' OK (Took: 119 min 25 sec)
> {noformat}
> This has happened on multiple runs (including some in progress):
> [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/1370/]
> [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/1382/]
> [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/1383/] (missing some 
> logs due to abort)
> [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/1384/] (in progress)
> [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/1385/] (in progress)
> Dataload creates a SQL script that invalidates each table created using an 
> "invalidate metadata ${tablename}" command. There are 830 "invalidate 
> metadata ${tablename}" calls in the invocation of this script (see 
> IMPALA-6386 for why we do invalidate at the table level). Even so, this 
> script should execute very quickly.
> The impalad.INFO from the 1370 run shows that this script is taking a long 
> time. The first invalidate metadata for functional tables is at 2:41 and the 
> last invalidate metadata for this run of the invalidate script is at 3:17. 
> The invalidate script runs twice. The second run begins at 3:19 and finishes 
> at 4:11. 

This message was sent by Atlassian JIRA

Reply via email to