(doris) branch master updated: [fix](test) Make test_analyze_long_string Case 5 stable against sample rows randomness (#64408)

morrysnow Sun, 14 Jun 2026 23:26:55 -0700

This is an automated email from the ASF dual-hosted git repository.

morrySnow pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git



The following commit(s) were added to refs/heads/master by this push:
     new 9fb627b3469 [fix](test) Make test_analyze_long_string Case 5 stable 
against sample rows randomness (#64408)
9fb627b3469 is described below

commit 9fb627b3469366537d5491c685823fb92870239b
Author: yujun <[email protected]>
AuthorDate: Mon Jun 15 14:26:39 2026 +0800

    [fix](test) Make test_analyze_long_string Case 5 stable against sample rows 
randomness (#64408)
    
    ## Problem
    
    `test_analyze_long_string` Case 5 (and potentially Case 3) can flake
    because
    after inserting data, the BE may not have reported the row count to FE
    yet.
    
    When `OlapAnalysisTask.doExecute()` runs with `info.rowCount == 0` and
    `tableSample != null`, it returns early without executing any SQL — the
    column finishes with `FINISHED` state but an empty message, so the
    expected
    skip reason from the `assert_true` long-string guard is never produced:
    
    ```
    expected skip reason visible for col big_str, got msg=
    ==> expected: <true> but was: <false>
    ```
    
    The audit log confirms that no sampling SQL was issued for `big_str` in
    the failing run — the task was short-circuited entirely.
    
    ## Fix
    
    1. **Suite.groovy**: Add `waitRowCountReady(db, table,
    expectedRowCount)`
       that polls `SHOW DATA FROM db.table` via `sql_return_maparray` until
       the BE-reported row count reaches the expected value.
    
    2. **test_analyze_long_string.groovy**: Call `waitRowCountReady` after
       inserts for both sample analyze cases:
       - Case 3 (sample percent 100)
       - Case 5 (sample rows 3, DUJ1 template)
    
    3. Case 5 data uses `repeat('z', 2048)` for all rows — a secondary
    defense
       against sample randomness missing the long row even when row count is
       properly reported.
    
    Co-authored-by: Claude <[email protected]>
---
 .../org/apache/doris/regression/suite/Suite.groovy | 27 ++++++++++++++++++++++
 .../statistics/test_analyze_long_string.groovy     | 12 ++++++----
 2 files changed, 35 insertions(+), 4 deletions(-)

diff --git 
a/regression-test/framework/src/main/groovy/org/apache/doris/regression/suite/Suite.groovy
 
b/regression-test/framework/src/main/groovy/org/apache/doris/regression/suite/Suite.groovy
index 3fc37eb7da3..0dc8f9d1737 100644
--- 
a/regression-test/framework/src/main/groovy/org/apache/doris/regression/suite/Suite.groovy
+++ 
b/regression-test/framework/src/main/groovy/org/apache/doris/regression/suite/Suite.groovy
@@ -2657,6 +2657,33 @@ class Suite implements GroovyInterceptable {
         return false;
     }
 
+    // Wait until the table row count reported by BE reaches the expected 
value.
+    // This is necessary before running sample analyze, because if row count 
is 0
+    // at task creation time, OlapAnalysisTask.doExecute() returns early 
without
+    // collecting any statistics, causing the analyze task to finish with an 
empty
+    // message instead of the expected skip reason.
+    void waitRowCountReady(String db, String table, long expectedRowCount) {
+        Awaitility.await().atMost(120, TimeUnit.SECONDS)
+                .pollInterval(3, TimeUnit.SECONDS).until {
+            def data = sql_return_maparray """SHOW DATA FROM ${db}.${table};"""
+            logger.info("SHOW DATA FROM ${db}.${table}: ${data}")
+            if (data.size() > 0) {
+                // Row 0 is the base index row which always has RowCount.
+                // The last row is "Total" whose RowCount may be empty.
+                def rc = data[0].RowCount
+                // RowCount may be empty string if BE hasn't reported yet.
+                if (rc == null || rc == '') {
+                    return false
+                }
+                def rowCount = (rc as long)
+                if (rowCount >= expectedRowCount) {
+                    return true
+                }
+            }
+            return false
+        }
+    }
+
     // Given tables to decide whether the table partition row count statistic 
is ready or not
     boolean is_partition_statistics_ready(db, tables)  {
         boolean isReady = true;
diff --git a/regression-test/suites/statistics/test_analyze_long_string.groovy 
b/regression-test/suites/statistics/test_analyze_long_string.groovy
index a234f44bad1..f9c0df470c2 100644
--- a/regression-test/suites/statistics/test_analyze_long_string.groovy
+++ b/regression-test/suites/statistics/test_analyze_long_string.groovy
@@ -162,6 +162,8 @@ suite("test_analyze_long_string", "nonConcurrent") {
     sql """insert into test_analyze_long_string_sample values(4, 'dd', 
'short3')"""
     sql """insert into test_analyze_long_string_sample values(5, 'ee', 
'short4')"""
 
+    waitRowCountReady("test_analyze_long_string", 
"test_analyze_long_string_sample", 5)
+
     setFeConfigTemporary([statistics_max_string_column_length: 1024]) {
         sql """analyze table test_analyze_long_string_sample with sample 
percent 100"""
         def jobId = findJobId("internal", "test_analyze_long_string", 
"test_analyze_long_string_sample")
@@ -232,10 +234,12 @@ suite("test_analyze_long_string", "nonConcurrent") {
         PROPERTIES ("replication_allocation" = "tag.location.default: 1");
     """
     sql """insert into test_analyze_long_string_duj1 values(1, 'aa', 
repeat('z', 2048))"""
-    sql """insert into test_analyze_long_string_duj1 values(2, 'bb', 
'short1')"""
-    sql """insert into test_analyze_long_string_duj1 values(3, 'cc', 
'short2')"""
-    sql """insert into test_analyze_long_string_duj1 values(4, 'dd', 
'short3')"""
-    sql """insert into test_analyze_long_string_duj1 values(5, 'ee', 
'short4')"""
+    sql """insert into test_analyze_long_string_duj1 values(2, 'bb', 
repeat('z', 2048))"""
+    sql """insert into test_analyze_long_string_duj1 values(3, 'cc', 
repeat('z', 2048))"""
+    sql """insert into test_analyze_long_string_duj1 values(4, 'dd', 
repeat('z', 2048))"""
+    sql """insert into test_analyze_long_string_duj1 values(5, 'ee', 
repeat('z', 2048))"""
+
+    waitRowCountReady("test_analyze_long_string", 
"test_analyze_long_string_duj1", 5)
 
     setFeConfigTemporary([statistics_max_string_column_length: 1024]) {
         
GetDebugPoint().enableDebugPointForAllFEs('OlapAnalysisTask.useDUJ1Template')


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(doris) branch master updated: [fix](test) Make test_analyze_long_string Case 5 stable against sample rows randomness (#64408)

Reply via email to