This is an automated email from the ASF dual-hosted git repository.

dbecker pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
     new 58da18a0a IMPALA-12655: Stabilize compute-table-stats.sh
58da18a0a is described below

commit 58da18a0a6f841dbdb85dbd267aa27e3fa2aea50
Author: Riza Suminto <[email protected]>
AuthorDate: Fri Feb 2 10:48:35 2024 -0800

    IMPALA-12655: Stabilize compute-table-stats.sh
    
    Impala data loading scripts run compute-table-stats.sh at the end of
    data loading to gather stats for certain test databases. Addition of
    tpcds_partitioned_parquet_snap recently cause instability during compute
    stats run. The large number of partitions to update seems to saturate
    number of connection to the underlying RDBMS of HMS. HMS operations
    often timing out after 30000ms.
    
    This patch attempt to alleviate the issue by running compute stats for
    tpcds_partitioned_parquet_snap in serial manner.
    
    Testing:
    - Pass FE tests.
    
    Change-Id: I49e4790d2361c985673bca967559af62bda9b421
    Reviewed-on: http://gerrit.cloudera.org:8080/20989
    Reviewed-by: Impala Public Jenkins <[email protected]>
    Tested-by: Impala Public Jenkins <[email protected]>
---
 testdata/bin/compute-table-stats.sh | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/testdata/bin/compute-table-stats.sh 
b/testdata/bin/compute-table-stats.sh
index 31a06f7ae..7ca914319 100755
--- a/testdata/bin/compute-table-stats.sh
+++ b/testdata/bin/compute-table-stats.sh
@@ -44,6 +44,10 @@ if [ "${TARGET_FILESYSTEM}" = "hdfs" ]; then
 fi
 ${COMPUTE_STATS_SCRIPT} --db_names=tpch,tpch_parquet,tpch_orc_def \
     --table_names=customer,lineitem,nation,orders,part,partsupp,region,supplier
-${COMPUTE_STATS_SCRIPT} \
-    
--db_names=tpch_nested_parquet,tpcds,tpcds_parquet,tpcds_partitioned_parquet_snap
+${COMPUTE_STATS_SCRIPT} --db_names=tpch_nested_parquet,tpcds,tpcds_parquet
 ${COMPUTE_STATS_SCRIPT} --db_names=functional_kudu,tpch_kudu
+
+# Compute tables of tpcds_partitioned_parquet_snap serially
+# due to large number of partitions in some of the fact tables.
+${COMPUTE_STATS_SCRIPT} --db_names=tpcds_partitioned_parquet_snap \
+    --parallelism=1

Reply via email to