This is an automated email from the ASF dual-hosted git repository.
morrySnow pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git
The following commit(s) were added to refs/heads/master by this push:
new 9fb627b3469 [fix](test) Make test_analyze_long_string Case 5 stable
against sample rows randomness (#64408)
9fb627b3469 is described below
commit 9fb627b3469366537d5491c685823fb92870239b
Author: yujun <[email protected]>
AuthorDate: Mon Jun 15 14:26:39 2026 +0800
[fix](test) Make test_analyze_long_string Case 5 stable against sample rows
randomness (#64408)
## Problem
`test_analyze_long_string` Case 5 (and potentially Case 3) can flake
because
after inserting data, the BE may not have reported the row count to FE
yet.
When `OlapAnalysisTask.doExecute()` runs with `info.rowCount == 0` and
`tableSample != null`, it returns early without executing any SQL — the
column finishes with `FINISHED` state but an empty message, so the
expected
skip reason from the `assert_true` long-string guard is never produced:
```
expected skip reason visible for col big_str, got msg=
==> expected: <true> but was: <false>
```
The audit log confirms that no sampling SQL was issued for `big_str` in
the failing run — the task was short-circuited entirely.
## Fix
1. **Suite.groovy**: Add `waitRowCountReady(db, table,
expectedRowCount)`
that polls `SHOW DATA FROM db.table` via `sql_return_maparray` until
the BE-reported row count reaches the expected value.
2. **test_analyze_long_string.groovy**: Call `waitRowCountReady` after
inserts for both sample analyze cases:
- Case 3 (sample percent 100)
- Case 5 (sample rows 3, DUJ1 template)
3. Case 5 data uses `repeat('z', 2048)` for all rows — a secondary
defense
against sample randomness missing the long row even when row count is
properly reported.
Co-authored-by: Claude <[email protected]>
---
.../org/apache/doris/regression/suite/Suite.groovy | 27 ++++++++++++++++++++++
.../statistics/test_analyze_long_string.groovy | 12 ++++++----
2 files changed, 35 insertions(+), 4 deletions(-)
diff --git
a/regression-test/framework/src/main/groovy/org/apache/doris/regression/suite/Suite.groovy
b/regression-test/framework/src/main/groovy/org/apache/doris/regression/suite/Suite.groovy
index 3fc37eb7da3..0dc8f9d1737 100644
---
a/regression-test/framework/src/main/groovy/org/apache/doris/regression/suite/Suite.groovy
+++
b/regression-test/framework/src/main/groovy/org/apache/doris/regression/suite/Suite.groovy
@@ -2657,6 +2657,33 @@ class Suite implements GroovyInterceptable {
return false;
}
+ // Wait until the table row count reported by BE reaches the expected
value.
+ // This is necessary before running sample analyze, because if row count
is 0
+ // at task creation time, OlapAnalysisTask.doExecute() returns early
without
+ // collecting any statistics, causing the analyze task to finish with an
empty
+ // message instead of the expected skip reason.
+ void waitRowCountReady(String db, String table, long expectedRowCount) {
+ Awaitility.await().atMost(120, TimeUnit.SECONDS)
+ .pollInterval(3, TimeUnit.SECONDS).until {
+ def data = sql_return_maparray """SHOW DATA FROM ${db}.${table};"""
+ logger.info("SHOW DATA FROM ${db}.${table}: ${data}")
+ if (data.size() > 0) {
+ // Row 0 is the base index row which always has RowCount.
+ // The last row is "Total" whose RowCount may be empty.
+ def rc = data[0].RowCount
+ // RowCount may be empty string if BE hasn't reported yet.
+ if (rc == null || rc == '') {
+ return false
+ }
+ def rowCount = (rc as long)
+ if (rowCount >= expectedRowCount) {
+ return true
+ }
+ }
+ return false
+ }
+ }
+
// Given tables to decide whether the table partition row count statistic
is ready or not
boolean is_partition_statistics_ready(db, tables) {
boolean isReady = true;
diff --git a/regression-test/suites/statistics/test_analyze_long_string.groovy
b/regression-test/suites/statistics/test_analyze_long_string.groovy
index a234f44bad1..f9c0df470c2 100644
--- a/regression-test/suites/statistics/test_analyze_long_string.groovy
+++ b/regression-test/suites/statistics/test_analyze_long_string.groovy
@@ -162,6 +162,8 @@ suite("test_analyze_long_string", "nonConcurrent") {
sql """insert into test_analyze_long_string_sample values(4, 'dd',
'short3')"""
sql """insert into test_analyze_long_string_sample values(5, 'ee',
'short4')"""
+ waitRowCountReady("test_analyze_long_string",
"test_analyze_long_string_sample", 5)
+
setFeConfigTemporary([statistics_max_string_column_length: 1024]) {
sql """analyze table test_analyze_long_string_sample with sample
percent 100"""
def jobId = findJobId("internal", "test_analyze_long_string",
"test_analyze_long_string_sample")
@@ -232,10 +234,12 @@ suite("test_analyze_long_string", "nonConcurrent") {
PROPERTIES ("replication_allocation" = "tag.location.default: 1");
"""
sql """insert into test_analyze_long_string_duj1 values(1, 'aa',
repeat('z', 2048))"""
- sql """insert into test_analyze_long_string_duj1 values(2, 'bb',
'short1')"""
- sql """insert into test_analyze_long_string_duj1 values(3, 'cc',
'short2')"""
- sql """insert into test_analyze_long_string_duj1 values(4, 'dd',
'short3')"""
- sql """insert into test_analyze_long_string_duj1 values(5, 'ee',
'short4')"""
+ sql """insert into test_analyze_long_string_duj1 values(2, 'bb',
repeat('z', 2048))"""
+ sql """insert into test_analyze_long_string_duj1 values(3, 'cc',
repeat('z', 2048))"""
+ sql """insert into test_analyze_long_string_duj1 values(4, 'dd',
repeat('z', 2048))"""
+ sql """insert into test_analyze_long_string_duj1 values(5, 'ee',
repeat('z', 2048))"""
+
+ waitRowCountReady("test_analyze_long_string",
"test_analyze_long_string_duj1", 5)
setFeConfigTemporary([statistics_max_string_column_length: 1024]) {
GetDebugPoint().enableDebugPointForAllFEs('OlapAnalysisTask.useDUJ1Template')
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]