This is an automated email from the ASF dual-hosted git repository.
dbecker pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
The following commit(s) were added to refs/heads/master by this push:
new 58da18a0a IMPALA-12655: Stabilize compute-table-stats.sh
58da18a0a is described below
commit 58da18a0a6f841dbdb85dbd267aa27e3fa2aea50
Author: Riza Suminto <[email protected]>
AuthorDate: Fri Feb 2 10:48:35 2024 -0800
IMPALA-12655: Stabilize compute-table-stats.sh
Impala data loading scripts run compute-table-stats.sh at the end of
data loading to gather stats for certain test databases. Addition of
tpcds_partitioned_parquet_snap recently cause instability during compute
stats run. The large number of partitions to update seems to saturate
number of connection to the underlying RDBMS of HMS. HMS operations
often timing out after 30000ms.
This patch attempt to alleviate the issue by running compute stats for
tpcds_partitioned_parquet_snap in serial manner.
Testing:
- Pass FE tests.
Change-Id: I49e4790d2361c985673bca967559af62bda9b421
Reviewed-on: http://gerrit.cloudera.org:8080/20989
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
---
testdata/bin/compute-table-stats.sh | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/testdata/bin/compute-table-stats.sh
b/testdata/bin/compute-table-stats.sh
index 31a06f7ae..7ca914319 100755
--- a/testdata/bin/compute-table-stats.sh
+++ b/testdata/bin/compute-table-stats.sh
@@ -44,6 +44,10 @@ if [ "${TARGET_FILESYSTEM}" = "hdfs" ]; then
fi
${COMPUTE_STATS_SCRIPT} --db_names=tpch,tpch_parquet,tpch_orc_def \
--table_names=customer,lineitem,nation,orders,part,partsupp,region,supplier
-${COMPUTE_STATS_SCRIPT} \
-
--db_names=tpch_nested_parquet,tpcds,tpcds_parquet,tpcds_partitioned_parquet_snap
+${COMPUTE_STATS_SCRIPT} --db_names=tpch_nested_parquet,tpcds,tpcds_parquet
${COMPUTE_STATS_SCRIPT} --db_names=functional_kudu,tpch_kudu
+
+# Compute tables of tpcds_partitioned_parquet_snap serially
+# due to large number of partitions in some of the fact tables.
+${COMPUTE_STATS_SCRIPT} --db_names=tpcds_partitioned_parquet_snap \
+ --parallelism=1