This is an automated email from the ASF dual-hosted git repository.
englefly pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git
The following commit(s) were added to refs/heads/master by this push:
new f621d8ea65 [doc](stats) full auto analyze docs #21918
f621d8ea65 is described below
commit f621d8ea65af21a89fb8efc408854ec44885adc2
Author: AKIRA <[email protected]>
AuthorDate: Wed Jul 26 17:30:10 2023 +0800
[doc](stats) full auto analyze docs #21918
Add description for some fe config about analyze
---
docs/en/docs/query-acceleration/statistics.md | 16 ++++++++++++++--
docs/zh-CN/docs/query-acceleration/statistics.md | 15 +++++++++++++--
2 files changed, 27 insertions(+), 4 deletions(-)
diff --git a/docs/en/docs/query-acceleration/statistics.md
b/docs/en/docs/query-acceleration/statistics.md
index 3280bfca15..7cbed02b62 100644
--- a/docs/en/docs/query-acceleration/statistics.md
+++ b/docs/en/docs/query-acceleration/statistics.md
@@ -873,6 +873,18 @@ User can delete automatic/periodic Analyze jobs based on
job ID.
DROP ANALYZE JOB [JOB_ID]
```
-## ANALYZE configuration item
+## Full auto analyze
-To be added.
+User could use option `enable_full_auto_analyze` to determine if enable full
auto analyze, if enabled Doris would analyze all databases automatically except
for some internal databases (information_db and etc.) and ignore the
`AUTO`/`PERIOD` jobs. By default it's `true`.
+
+## Other ANALYZE configuration item
+
+
+| conf
| comment
[...]
+|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[...]
+| statistics_sql_parallel_exec_instance_num
| Control the
number of concurrent instances/pipeline tasks on the BE side for each
statistics collection SQL.
[...]
+| statistics_sql_mem_limit_in_bytes
| Control the
amount of BE memory that each statistics SQL can occupy.
[...]
+| statistics_simultaneously_running_task_num
| The number
of concurrent AnalyzeTasks that can be executed.
[...]
+| analyze_task_timeout_in_minutes | Execution time
limit for AnalyzeTask, timeout task would be cancelled
| 2hours |
+| full_auto_analyze_start_time/full_auto_analyze_end_time | Full auto analyze
execution time range,full auto analyze would only be trigger in this range
| 00:00:00-23:59:59 |
+|stats_cache_size|The actual memory size taken by stats cache highly depends
on characteristics of data, since on the different dataset and scenarios the
max/min literal's average size and buckets count of histogram would be highly
different. Besides, JVM version etc. also has influence on it, though not much
as data itself. Here I would give the mem size taken by stats cache with
10_0000 items.Each item's avg length of max/min literal is 32, and the avg
column name length is 16, and eac [...]
diff --git a/docs/zh-CN/docs/query-acceleration/statistics.md
b/docs/zh-CN/docs/query-acceleration/statistics.md
index f90e02b9b5..a4db5415a4 100644
--- a/docs/zh-CN/docs/query-acceleration/statistics.md
+++ b/docs/zh-CN/docs/query-acceleration/statistics.md
@@ -934,6 +934,17 @@ mysql> DROP STATS stats_test.example_tbl(city, age, sex);
DROP ANALYZE JOB [JOB_ID]
```
-## ANALYZE 配置项
+## Full auto analyze
-待补充。
+用户可以使用选项 `enable_full_auto_analyze` 来决定是否启用Full auto
analyze。如果启用,Doris会自动分析除了一些内部数据库(如`information_db`等)之外的所有数据库,并忽略AUTO/PERIOD作业。默认情况下,该选项为true。
+
+## Other ANALYZE configuration item
+
+| conf | comment
| default value |
+|---------------------------------------------------------|---------------------------------------------------------|--------------------------------|
+| statistics_sql_parallel_exec_instance_num |
控制每个统计信息收集SQL在BE侧的并发实例数/pipeline task num | 1
|
+| statistics_sql_mem_limit_in_bytes |
控制每个统计信息SQL可占用的BE内存 | 2L * 1024 * 1024 *
1024 (2GiB) |
+| statistics_simultaneously_running_task_num |
可并发执行的AnalyzeTask数量 | 10
|
+| analyze_task_timeout_in_minutes | AnalyzeTask执行超时时间
| 2hours |
+| full_auto_analyze_start_time/full_auto_analyze_end_time | Full auto analyze
执行时间范围,该时间段之外的时间不会触发full auto analyze | 00:00:00-23:59:59|
|
+|stats_cache_size|
统计信息缓存的实际内存占用大小高度依赖于数据的特性,因为在不同的数据集和场景中,最大/最小值的平均大小和直方图的桶数量会有很大的差异。此外,JVM版本等因素也会对其产生影响。在这里,我将给出统计信息缓存在包含10_0000个项目时所占用的内存大小。每个项目的最大/最小值的平均长度为32,列名的平均长度为16,并且每个列都有一个具有128个桶的直方图。在这种情况下,统计信息缓存总共占用了911.954833984MiB的内存。如果没有直方图,统计信息缓存总共占用了61.2777404785MiB的内存。强烈不建议分析具有非常大字符串值的列,因为这可能导致FE内存溢出(OOM)。
| 10_0000
|
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]