This is an automated email from the ASF dual-hosted git repository.

morrysnow pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
     new bd23db762d [minor](stats) Add doc for stats framework (#19311)
bd23db762d is described below

commit bd23db762d913bd186a46eb6bf50d46210e99b82
Author: AKIRA <[email protected]>
AuthorDate: Sat May 6 14:30:55 2023 +0900

    [minor](stats) Add doc for stats framework (#19311)
---
 .../java/org/apache/doris/statistics/README.md     | 116 +++++++++++++++++++++
 1 file changed, 116 insertions(+)

diff --git a/fe/fe-core/src/main/java/org/apache/doris/statistics/README.md 
b/fe/fe-core/src/main/java/org/apache/doris/statistics/README.md
new file mode 100644
index 0000000000..9f4e9034d7
--- /dev/null
+++ b/fe/fe-core/src/main/java/org/apache/doris/statistics/README.md
@@ -0,0 +1,116 @@
+- [Requiredments](#requiredments)
+    - [Basic](#basic)
+    - [Adavanced(Not finished yet)](#adavancednot-finished-yet)
+    - [Specification](#specification)
+    - [Compatibility](#compatibility)
+        - [Function compatibility](#function-compatibility)
+        - [Version compatibility](#version-compatibility)
+- [Implementation](#implementation)
+    - [Main class](#main-class)
+    - [Analyze execution flow](#analyze-execution-flow)
+    - [Load execution flow](#load-execution-flow)
+- [Configure options](#configure-options)
+- [User interface](#user-interface)
+- [Test](#test)
+
+# Requiredments
+
+## Basic
+
+Provide necessary data for the optimizer to calculate and compare various 
plans. This includes count, ndv, null_count, min, max, data_size, histogram for 
each column, as well as the number of rows in the table.
+
+## Adavanced(Not finished yet)
+
+Support incremental collectio and auto collection
+
+## Specification
+
+## Compatibility
+
+### Function compatibility
+
+No conflicts with any other function.
+
+### Version compatibility
+
+There may be compatibility issues if there are changes to the schema of the 
stats table in the future.
+
+# Implementation
+
+
+## Main class
+
+|Class name|Function|
+|---|---|
+|AnalyzeStmt|Constructed by parsing user-input SQL, each AnalyzeStmt 
corresponds to a Job, and a Job can have multiple Tasks, with each Task 
responsible for collecting statistics information on a column.|
+|AnalysisManager|Mainly responsible for managing Analyze Jobs/Tasks, including 
creation, execution, cancellation, and status updates, etc.|
+|StatisticCache|The collected statistical information is cached here on 
demand.|
+|StatisticCacheLoader|When `StatsCalculator#computeScan` fails to find the 
corresponding stats for a column in the cache, the load logic will be 
triggered, which is implemented in this class.|
+|AnalysisTaskExecutor|Used to excute AnalyzeJob|
+|AnalysisTaskWrapper|This class encapsulates an `AnalysisTask` and extends 
`FutureTask`. It overrides some methods for state updates.|
+|AnalysisTaskScheduler|AnalysisTaskExecutor retrieves jobs from here for 
execution. Manually submitted jobs always have higher priority than 
automatically triggered ones.|
+|StatisticsCleaner|Responsible for cleaning up expired statistics and job 
information.|
+|StatisticsRepository|Most of the related SQL is defined here.|
+|StatisticsUtil|Mainly consists of helper methods, such as checking the status 
of stats-related tables.|
+
+## Analyze execution flow
+```mermaid
+sequenceDiagram
+DdlExecutor->>AnalysisManager: createAnalysisJob
+AnalysisManager->>AnalysisManager: validateAndGetPartitions
+AnalysisManager->>AnalysisManager: createTaskForEachColumns
+AnalysisManager->>AnalysisManager: createTaskForMVIdx
+alt is sync task
+    AnalysisManager->>AnalysisManager: syncExecute
+else is async task
+    AnalysisManager->>StatisticsRepository: persist
+    StatisticsRepository->>BE: write
+    AnalysisManager->>AnalysisTaskScheduler: schedule
+    AnalysisTaskScheduler->>AnalysisTaskExecutor: notify
+    AnalysisTaskExecutor->>AnalysisTaskScheduler: getPendingTasks
+    AnalysisTaskExecutor->>ThreadPoolExecutor: submit(AnalysisTaskWrapper)
+    ThreadPoolExecutor->>AnalysisTaskWrapper: run
+    AnalysisTaskWrapper->>BE: collect && write
+    AnalysisTaskWrapper->>StatisticCache: refresh
+    AnalysisTaskWrapper->>AnalysisManager: updateTaskStatus
+    alt is all task finished
+        AnalysisManager->> StatisticsUtil: execUpdate mark job finished
+        StatisticsUtil->> BE: update job status
+    end
+end
+
+```
+## Load execution flow
+
+```mermaid
+sequenceDiagram
+StatsCalculator->>StatisticCache: get
+alt is cached
+    StatisticCache->>StatsCalculator: return cached stats
+else not cached
+    StatisticCache->>StatsCalculator: return UNKNOWN stats
+    StatisticCache->>ThreadPoolExecutor: submit load task
+    ThreadPoolExecutor->>AsyncTask: get
+    AsyncTask->>StatisticsUtil: execStatisticQuery
+        alt exception occurred:
+        AsyncTask->>StatisticCache: return UNKNOWN stats
+        StatisticCache->> StatisticCache: cache UNKNOWN for the column
+    else no exception:
+            StatisticsUtil->>AsyncTask: Return results rows
+            AsyncTask->>StatisticsUtil: deserializeToColumnStatistics(result 
rows)
+            alt exception occurred:
+                AsyncTask->>StatisticCache: return UNKNOWN stats
+                StatisticCache->> StatisticCache: cache UNKNOWN for the column
+            else no exception:
+                StatisticCache->> StatisticCache: cache normal stats
+            end
+    end
+
+end
+```
+
+# Configure options
+
+# User interface
+
+# Test


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to