IMPALA-6070: Parallelize another bit of data load.

The two Kudu loads and Hive UDFs can all run in parallel. This
should shave about 4 minutes off of the data load. (Current
timings are 3.5, 4, and 0.6 minutes, see below.)

I've run dataload with this change many times.

   Loading Kudu functional (logging to 
/home/ubuntu/Impala/logs/data_loading/load-kudu.log)...
     Loading workload 'functional-query' using exploration strategy 'core' in 
table formats 'kudu/none/none' OK (Took: 3 min 29 sec)
   Loading Kudu TPCH (logging to 
/home/ubuntu/Impala/logs/data_loading/load-kudu-tpch.log)...
     Loading workload 'tpch' using exploration strategy 'core' in table formats 
'kudu/none/none' OK (Took: 4 min 0 sec)
   Loading Hive UDFs (logging to 
/home/ubuntu/Impala/logs/data_loading/build-and-copy-hive-udfs.log)...
     Loading Hive UDFs OK (Took: 0 min 41 sec)

Change-Id: I7e93ee5a77ec9271b980b88bef7ad512ecbe0407
Reviewed-on: http://gerrit.cloudera.org:8080/8822
Reviewed-by: Dimitris Tsirogiannis <dtsirogian...@cloudera.com>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/11dbb395
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/11dbb395
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/11dbb395

Branch: refs/heads/master
Commit: 11dbb3952a1c598f27de281c5020ed2df325d6e8
Parents: 5c593be
Author: Philip Zeyliger <phi...@cloudera.com>
Authored: Tue Dec 12 15:38:54 2017 -0800
Committer: Impala Public Jenkins <impala-public-jenk...@gerrit.cloudera.org>
Committed: Thu Dec 14 02:28:40 2017 +0000

----------------------------------------------------------------------
 testdata/bin/create-load-data.sh | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/11dbb395/testdata/bin/create-load-data.sh
----------------------------------------------------------------------
diff --git a/testdata/bin/create-load-data.sh b/testdata/bin/create-load-data.sh
index 099fe59..df6622a 100755
--- a/testdata/bin/create-load-data.sh
+++ b/testdata/bin/create-load-data.sh
@@ -507,13 +507,14 @@ fi
 
 if $KUDU_IS_SUPPORTED; then
   # Tests depend on the kudu data being clean, so load the data from scratch.
-  run-step "Loading Kudu functional" load-kudu.log \
+  run-step-backgroundable "Loading Kudu functional" load-kudu.log \
         load-data "functional-query" "core" "kudu/none/none" force
-  run-step "Loading Kudu TPCH" load-kudu-tpch.log \
+  run-step-backgroundable "Loading Kudu TPCH" load-kudu-tpch.log \
         load-data "tpch" "core" "kudu/none/none" force
 fi
-run-step "Loading Hive UDFs" build-and-copy-hive-udfs.log \
+run-step-backgroundable "Loading Hive UDFs" build-and-copy-hive-udfs.log \
     build-and-copy-hive-udfs
+run-step-wait-all
 run-step "Running custom post-load steps" custom-post-load-steps.log \
     custom-post-load-steps
 

Reply via email to