This is an automated email from the ASF dual-hosted git repository.
luzhijing pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git
The following commit(s) were added to refs/heads/master by this push:
new ad5e34ab9c [Doc](statistics) supplement stats doc (regression test and
automatic collection) (#20071)
ad5e34ab9c is described below
commit ad5e34ab9c7651f70af8d6227e973e2719da5888
Author: ElvinWei <[email protected]>
AuthorDate: Sat Jun 3 17:25:33 2023 +0800
[Doc](statistics) supplement stats doc (regression test and automatic
collection) (#20071)
---
docs/en/docs/query-acceleration/statistics.md | 47 +++++++++++++++++++++-
docs/zh-CN/docs/query-acceleration/statistics.md | 45 ++++++++++++++++++++-
.../java/org/apache/doris/statistics/README.md | 35 ++++++++++++++++
3 files changed, 125 insertions(+), 2 deletions(-)
diff --git a/docs/en/docs/query-acceleration/statistics.md
b/docs/en/docs/query-acceleration/statistics.md
index c9106d9e75..e769177753 100644
--- a/docs/en/docs/query-acceleration/statistics.md
+++ b/docs/en/docs/query-acceleration/statistics.md
@@ -403,7 +403,52 @@ mysql> ANALYZE TABLE stats_test.example_tbl UPDATE
HISTOGRAM WITH PERIOD 86400;
#### Automatic collection
-To be added.
+Statistics can be "invalidated" when tables are changed, which can cause the
optimizer to select the wrong execution plan.
+
+Table statistics may become invalid due to the following causes:
+
+- New field: The new field has no statistics
+- Field change: Original statistics are unavailable
+- Added zone: The new zone has no statistics
+- Zone change: The original statistics are invalid
+- data changes (insert data delete data | | change data) : the statistical
information is error
+
+The main operations involved include:
+
+- update: updates the data
+- delete: deletes data
+- drop: deletes a partition
+- load: import data and add partitions
+- insert: inserts data and adds partitions
+- alter: Field change, partition change, or new partition
+
+Database, table, partition, field deletion, internal will automatically clear
these invalid statistics. Adjusting the column order and changing the column
type do not affect.
+
+The system determines whether to collect statistics again based on the health
of the table (as defined above). By setting the health threshold, the system
collects statistics about the table again when the health is lower than a
certain value. To put it simply, if statistics are collected on a table and the
data of a partition becomes more or less, or a partition is added or deleted,
the statistics may be automatically collected. After the statistics are
collected again, the statistics a [...]
+
+Currently, only tables that are configured by the user to automatically
collect statistics will be collected, and statistics will not be automatically
collected for other tables.
+
+Example:
+
+- Automatically analysis statistics for the 'example_tbl' table using the
following syntax:
+
+```SQL
+-- use with auto
+mysql> ANALYZE TABLE stats_test.example_tbl WITH AUTO;
++--------+
+| job_id |
++--------+
+| 52539 |
++--------+
+
+-- configure automatic
+mysql> ANALYZE TABLE stats_test.example_tbl PROPERTIES("automatic" = "true");
++--------+
+| job_id |
++--------+
+| 52565 |
++--------+
+```
### Manage job
diff --git a/docs/zh-CN/docs/query-acceleration/statistics.md
b/docs/zh-CN/docs/query-acceleration/statistics.md
index c0091507b7..a232d87eb0 100644
--- a/docs/zh-CN/docs/query-acceleration/statistics.md
+++ b/docs/zh-CN/docs/query-acceleration/statistics.md
@@ -434,7 +434,50 @@ mysql> ANALYZE TABLE stats_test.example_tbl UPDATE
HISTOGRAM WITH PERIOD 86400;
#### 自动收集
-待补充。
+表发生变更时可能会导致统计信息“失效”,可能会导致优化器选择错误的执行计划。
+
+导致表统计信息失效的原因包括:
+
+- 新增字段:新增字段无统计信息
+- 字段变更:原有统计信息不可用
+- 新增分区:新增分区无统计信息
+- 分区变更:原有统计信息失效
+- 数据变更(插入数据 | 删除数据 | 更改数据):统计信息有误差
+
+主要涉及的操作包括:
+
+- update:更新数据
+- delete:删除数据
+- drop:删除分区
+- load:导入数据、新增分区
+- insert:插入数据、新增分区
+- alter:字段变更、分区变更、新增分区
+
+其中库、表、分区、字段删除,内部会自动清除这些无效的统计信息。调整列顺序以及修改列类型不影响。
+
+系统根据表的健康度(参考上文定义)来决定是否需要重新收集统计信息。我们通过设置健康度阈值,当健康度低于某个值时系统将重新收集表对应的统计信息。简单来讲就是对于收集过统计信息的表,如果某一个分区数据变多/变少、或者新增/删除分区,都有可能触发统计信息的自动收集,重新收集后更新表的统计信息和健康度。目前只会收集用户配置了自动收集统计信息的表,其他表不会自动收集统计信息。
+
+示例:
+
+- 自动收集 `example_tbl` 表的统计信息,使用以下语法:
+
+```SQL
+-- 使用with auto
+mysql> ANALYZE TABLE stats_test.example_tbl WITH AUTO;
++--------+
+| job_id |
++--------+
+| 52539 |
++--------+
+
+-- 配置automatic
+mysql> ANALYZE TABLE stats_test.example_tbl PROPERTIES("automatic" = "true");
++--------+
+| job_id |
++--------+
+| 52565 |
++--------+
+```
### 管理任务
diff --git a/fe/fe-core/src/main/java/org/apache/doris/statistics/README.md
b/fe/fe-core/src/main/java/org/apache/doris/statistics/README.md
index e3a577528a..ef9340b281 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/statistics/README.md
+++ b/fe/fe-core/src/main/java/org/apache/doris/statistics/README.md
@@ -116,6 +116,41 @@ end
# Test
+The regression tests now mainly cover the following.
+
+- Analyze stats: mainly to verify the `ANALYZE` statement and its related
characteristics, because some functions are affected by other factors (such as
system metadata reporting time), may show instability, so this part is placed
in p1.
+- Manage stats: mainly used to verify the injection, deletion, display and
other related operations of statistical information.
+
+For more, see
[statistics_p0](https://github.com/apache/doris/tree/master/regression-test/suites/statistics)
[statistics_p1](https://github.com/apache/doris/tree/master/regression-test/suites/statistics_p1)
+
+## Analyze stats
+
+p0 tests:
+
+1. Universal analysis
+
+p1 tests:
+
+1. Universal analysis
+2. Sampled analysis
+3. Incremental analysis
+4. Automatic analysis
+5. Periodic analysis
+
+## Manage stats
+
+p0 tests:
+
+1. Alter table stats
+2. Show table stats
+3. Alter column stats
+4. Show column stats
+5. Show column histogram
+6. Drop column stats
+7. Drop expired stats
+
+For the modification of the statistics module, all the above cases should be
guaranteed to pass!
+
# Feature note
20230508:
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]