This is an automated email from the ASF dual-hosted git repository.
lijibing pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git
The following commit(s) were added to refs/heads/master by this push:
new 9f5b4c63d0b [opt](staticstis) use count(1) for rowCount when scan full
table (#58153)
9f5b4c63d0b is described below
commit 9f5b4c63d0bd1498fcae9f7d398d70a06803e8e9
Author: yujun <[email protected]>
AuthorDate: Thu Nov 20 17:59:22 2025 +0800
[opt](staticstis) use count(1) for rowCount when scan full table (#58153)
### What problem does this PR solve?
when do sample, it will use table.getRowCount() as rowsCount, but the
table.getRowCount() may be stale because it depend on BE's report, then
it may occur rowsCount < ndv.
Then when if 10 * rowsCount < ndv, the analyze sql will fail.
Then the regression test statistics/analyze_stats.groovy is not stable,
and cause error:
```
Exception:
java.sql.SQLException: errCode = 2, detailMessage = Failed to analyze
following columns:[id] Reasons: java.lang.RuntimeException: ColStatsData is
invalid, skip analyzing.
('1763112020393--1-id',0,1763112019723,1763112020393,-1,'id',null,1,16,0,'1','201',64,'2025-11-14
17:41:14','105 :0.06 ;104 :0.06 ;103 :0.06 ;102 :0.06 ;101 :0.06 ;10 :0.06 ;9
:0.06 ;8 :0.06 ;7 :0.06 ;6 :0.06')
at
com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:129)
at
com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)
at
com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:953)
at
com.mysql.cj.jdbc.ClientPreparedStatement.execute(ClientPreparedStatement.java:371)
at
org.codehaus.groovy.vmplugin.v8.IndyInterface.fromCache(IndyInterface.java:321)
at
org.apache.doris.regression.util.JdbcUtils$_executeToList_closure1.doCall(JdbcUtils.groovy:47)
at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:343)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:328)
at
org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:279)
```
so when do sample and scan whole table, we use count(1) to represent
rowsCount.
Notice that this replace will not increase the excute cost, because the
staticstic sql has contained `count(1)`.
---
.../src/main/java/org/apache/doris/statistics/OlapAnalysisTask.java | 2 ++
.../test/java/org/apache/doris/statistics/OlapAnalysisTaskTest.java | 5 +++++
2 files changed, 7 insertions(+)
diff --git
a/fe/fe-core/src/main/java/org/apache/doris/statistics/OlapAnalysisTask.java
b/fe/fe-core/src/main/java/org/apache/doris/statistics/OlapAnalysisTask.java
index 6d3b5a3a40e..2f73ae87382 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/statistics/OlapAnalysisTask.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/statistics/OlapAnalysisTask.java
@@ -271,6 +271,8 @@ public class OlapAnalysisTask extends BaseAnalysisTask {
params.put("scaleFactor", "1");
params.put("sampleHints", "");
params.put("ndvFunction", "ROUND(NDV(`${colName}`) *
${scaleFactor})");
+ // For full table scan, use COUNT(1) for table row count.
+ params.put("rowCount", "COUNT(1)");
params.put("rowCount2", "(SELECT COUNT(1) FROM cte1 WHERE
`${colName}` IS NOT NULL)");
scanFullTable = true;
return;
diff --git
a/fe/fe-core/src/test/java/org/apache/doris/statistics/OlapAnalysisTaskTest.java
b/fe/fe-core/src/test/java/org/apache/doris/statistics/OlapAnalysisTaskTest.java
index c8f9b397479..fa1879db5f3 100644
---
a/fe/fe-core/src/test/java/org/apache/doris/statistics/OlapAnalysisTaskTest.java
+++
b/fe/fe-core/src/test/java/org/apache/doris/statistics/OlapAnalysisTaskTest.java
@@ -375,6 +375,11 @@ public class OlapAnalysisTaskTest {
Assertions.assertEquals("", params.get("sampleHints"));
Assertions.assertEquals("ROUND(NDV(`${colName}`) * ${scaleFactor})",
params.get("ndvFunction"));
Assertions.assertNull(params.get("preAggHint"));
+ Assertions.assertEquals("COUNT(1)", params.get("rowCount"));
+ params.clear();
+
+ task.getSampleParams(params, 10000);
+ Assertions.assertEquals("10000", params.get("rowCount"));
params.clear();
new MockUp<OlapTable>() {
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]