Joe McDonnell created IMPALA-6567:

             Summary: Functional dataload is intermittently super-slow
                 Key: IMPALA-6567
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
    Affects Versions: Impala 2.12.0
            Reporter: Joe McDonnell

Recent GVO builds intermittently have a functional dataload of almost 2 hours 
when it used to be ~30-35 minutes:
*02:12:15* Loading TPC-DS data (logging to 
/home/ubuntu/Impala/logs/data_loading/load-tpcds.log)... *02:34:27*   Loading 
workload 'tpch' using exploration strategy 'core' OK (Took: 22 min 12 
sec)*02:34:35*   Loading workload 'tpcds' using exploration strategy 'core' OK 
(Took: 22 min 20 sec)*04:11:40*   Loading workload 'functional-query' using 
exploration strategy 'exhaustive' OK (Took: 119 min 25 sec)

This has happened on multiple runs (including some in progress):



[] (missing some 
logs due to abort)

[] (in progress)

[] (in progress)


Dataload creates a SQL script that invalidates each table created using an 
"invalidate metadata ${tablename}" command. There are 830 "invalidate metadata 
${tablename}" calls in the invocation of this script (see IMPALA-6386 for why 
we do invalidate at the table level). Even so, this script should execute very 

The impalad.INFO from the 1370 run shows that this script is taking a long 
time. The first invalidate metadata for functional tables is at 2:41 and the 
last invalidate metadata for this run of the invalidate script is at 3:17. 

The invalidate script runs twice. The second run begins at 3:19 and finishes at 


This message was sent by Atlassian JIRA

Reply via email to