[doris] branch master updated: [doc](stats) full auto analyze docs #21918

englefly Wed, 26 Jul 2023 02:31:12 -0700

This is an automated email from the ASF dual-hosted git repository.

englefly pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git



The following commit(s) were added to refs/heads/master by this push:
     new f621d8ea65 [doc](stats) full auto analyze docs #21918
f621d8ea65 is described below

commit f621d8ea65af21a89fb8efc408854ec44885adc2
Author: AKIRA <[email protected]>
AuthorDate: Wed Jul 26 17:30:10 2023 +0800

    [doc](stats) full auto analyze docs #21918
    
    Add description for some fe config about analyze
---
 docs/en/docs/query-acceleration/statistics.md    | 16 ++++++++++++++--
 docs/zh-CN/docs/query-acceleration/statistics.md | 15 +++++++++++++--
 2 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/docs/en/docs/query-acceleration/statistics.md 
b/docs/en/docs/query-acceleration/statistics.md
index 3280bfca15..7cbed02b62 100644
--- a/docs/en/docs/query-acceleration/statistics.md
+++ b/docs/en/docs/query-acceleration/statistics.md
@@ -873,6 +873,18 @@ User can delete automatic/periodic Analyze jobs based on 
job ID.
 DROP ANALYZE JOB [JOB_ID]
 ```
 
-## ANALYZE configuration item
+## Full auto analyze
 
-To be added.
+User could use option `enable_full_auto_analyze` to determine if enable full 
auto analyze, if enabled Doris would analyze all databases automatically except 
for some internal databases (information_db and etc.) and ignore the 
`AUTO`/`PERIOD` jobs. By default it's `true`.
+
+## Other ANALYZE configuration item
+
+
+| conf                                                                         
                                                                                
                                                                                
                                                                  | comment     
                                                                                
                                                                                
              [...]
+|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 [...]
+| statistics_sql_parallel_exec_instance_num                                    
                                                                                
                                                                                
                                                                  | Control the 
number of concurrent instances/pipeline tasks on the BE side for each 
statistics collection SQL.                                                      
                        [...]
+| statistics_sql_mem_limit_in_bytes                                            
                                                                                
                                                                                
                                                                  | Control the 
amount of BE memory that each statistics SQL can occupy.                        
                                                                                
              [...]
+| statistics_simultaneously_running_task_num                                   
                                                                                
                                                                                
                                                                  | The number 
of concurrent AnalyzeTasks that can be executed.                                
                                                                                
               [...]
+| analyze_task_timeout_in_minutes                         | Execution time 
limit for AnalyzeTask, timeout task would be cancelled                          
                                                                                
                                                                                
                                     | 2hours                         |
+| full_auto_analyze_start_time/full_auto_analyze_end_time | Full auto analyze 
execution time range，full auto analyze would only be trigger in this range      
                                                                                
                                                                                
                                  | 00:00:00-23:59:59              |
+|stats_cache_size|The actual memory size taken by stats cache highly depends 
on characteristics of data, since on the different dataset and scenarios the 
max/min literal's average size and buckets count of histogram would be highly 
different. Besides, JVM version etc. also has influence on it, though not much 
as data itself. Here I would give the mem size taken by stats cache with 
10_0000 items.Each item's avg length of max/min literal is 32, and the avg 
column name length is 16, and eac [...]
diff --git a/docs/zh-CN/docs/query-acceleration/statistics.md 
b/docs/zh-CN/docs/query-acceleration/statistics.md
index f90e02b9b5..a4db5415a4 100644
--- a/docs/zh-CN/docs/query-acceleration/statistics.md
+++ b/docs/zh-CN/docs/query-acceleration/statistics.md
@@ -934,6 +934,17 @@ mysql> DROP STATS stats_test.example_tbl(city, age, sex);
 DROP ANALYZE JOB [JOB_ID]
 ```
 
-## ANALYZE 配置项
+## Full auto analyze
 
-待补充。
+用户可以使用选项 `enable_full_auto_analyze` 来决定是否启用Full auto 
analyze。如果启用，Doris会自动分析除了一些内部数据库（如`information_db`等）之外的所有数据库，并忽略AUTO/PERIOD作业。默认情况下，该选项为true。
+
+## Other ANALYZE configuration item
+
+| conf                                                    | comment            
                                     | default value                  |
+|---------------------------------------------------------|---------------------------------------------------------|--------------------------------|
+| statistics_sql_parallel_exec_instance_num               | 
控制每个统计信息收集SQL在BE侧的并发实例数/pipeline task num               | 1                     
         |
+| statistics_sql_mem_limit_in_bytes                       | 
控制每个统计信息SQL可占用的BE内存                                     | 2L * 1024 * 1024 * 
1024 (2GiB) |
+| statistics_simultaneously_running_task_num              | 
可并发执行的AnalyzeTask数量                                     | 10                    
         |
+| analyze_task_timeout_in_minutes                         | AnalyzeTask执行超时时间  
                                     | 2hours                         |
+| full_auto_analyze_start_time/full_auto_analyze_end_time | Full auto analyze 
执行时间范围，该时间段之外的时间不会触发full auto analyze | 00:00:00-23:59:59|                      
|
+|stats_cache_size| 
统计信息缓存的实际内存占用大小高度依赖于数据的特性，因为在不同的数据集和场景中，最大/最小值的平均大小和直方图的桶数量会有很大的差异。此外，JVM版本等因素也会对其产生影响。在这里，我将给出统计信息缓存在包含10_0000个项目时所占用的内存大小。每个项目的最大/最小值的平均长度为32，列名的平均长度为16，并且每个列都有一个具有128个桶的直方图。在这种情况下，统计信息缓存总共占用了911.954833984MiB的内存。如果没有直方图，统计信息缓存总共占用了61.2777404785MiB的内存。强烈不建议分析具有非常大字符串值的列，因为这可能导致FE内存溢出（OOM）。
 | 10_0000                                                                      
                                |


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[doris] branch master updated: [doc](stats) full auto analyze docs #21918

Reply via email to