This is an automated email from the ASF dual-hosted git repository.
morrysnow pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git
The following commit(s) were added to refs/heads/master by this push:
new bd23db762d [minor](stats) Add doc for stats framework (#19311)
bd23db762d is described below
commit bd23db762d913bd186a46eb6bf50d46210e99b82
Author: AKIRA <[email protected]>
AuthorDate: Sat May 6 14:30:55 2023 +0900
[minor](stats) Add doc for stats framework (#19311)
---
.../java/org/apache/doris/statistics/README.md | 116 +++++++++++++++++++++
1 file changed, 116 insertions(+)
diff --git a/fe/fe-core/src/main/java/org/apache/doris/statistics/README.md
b/fe/fe-core/src/main/java/org/apache/doris/statistics/README.md
new file mode 100644
index 0000000000..9f4e9034d7
--- /dev/null
+++ b/fe/fe-core/src/main/java/org/apache/doris/statistics/README.md
@@ -0,0 +1,116 @@
+- [Requiredments](#requiredments)
+ - [Basic](#basic)
+ - [Adavanced(Not finished yet)](#adavancednot-finished-yet)
+ - [Specification](#specification)
+ - [Compatibility](#compatibility)
+ - [Function compatibility](#function-compatibility)
+ - [Version compatibility](#version-compatibility)
+- [Implementation](#implementation)
+ - [Main class](#main-class)
+ - [Analyze execution flow](#analyze-execution-flow)
+ - [Load execution flow](#load-execution-flow)
+- [Configure options](#configure-options)
+- [User interface](#user-interface)
+- [Test](#test)
+
+# Requiredments
+
+## Basic
+
+Provide necessary data for the optimizer to calculate and compare various
plans. This includes count, ndv, null_count, min, max, data_size, histogram for
each column, as well as the number of rows in the table.
+
+## Adavanced(Not finished yet)
+
+Support incremental collectio and auto collection
+
+## Specification
+
+## Compatibility
+
+### Function compatibility
+
+No conflicts with any other function.
+
+### Version compatibility
+
+There may be compatibility issues if there are changes to the schema of the
stats table in the future.
+
+# Implementation
+
+
+## Main class
+
+|Class name|Function|
+|---|---|
+|AnalyzeStmt|Constructed by parsing user-input SQL, each AnalyzeStmt
corresponds to a Job, and a Job can have multiple Tasks, with each Task
responsible for collecting statistics information on a column.|
+|AnalysisManager|Mainly responsible for managing Analyze Jobs/Tasks, including
creation, execution, cancellation, and status updates, etc.|
+|StatisticCache|The collected statistical information is cached here on
demand.|
+|StatisticCacheLoader|When `StatsCalculator#computeScan` fails to find the
corresponding stats for a column in the cache, the load logic will be
triggered, which is implemented in this class.|
+|AnalysisTaskExecutor|Used to excute AnalyzeJob|
+|AnalysisTaskWrapper|This class encapsulates an `AnalysisTask` and extends
`FutureTask`. It overrides some methods for state updates.|
+|AnalysisTaskScheduler|AnalysisTaskExecutor retrieves jobs from here for
execution. Manually submitted jobs always have higher priority than
automatically triggered ones.|
+|StatisticsCleaner|Responsible for cleaning up expired statistics and job
information.|
+|StatisticsRepository|Most of the related SQL is defined here.|
+|StatisticsUtil|Mainly consists of helper methods, such as checking the status
of stats-related tables.|
+
+## Analyze execution flow
+```mermaid
+sequenceDiagram
+DdlExecutor->>AnalysisManager: createAnalysisJob
+AnalysisManager->>AnalysisManager: validateAndGetPartitions
+AnalysisManager->>AnalysisManager: createTaskForEachColumns
+AnalysisManager->>AnalysisManager: createTaskForMVIdx
+alt is sync task
+ AnalysisManager->>AnalysisManager: syncExecute
+else is async task
+ AnalysisManager->>StatisticsRepository: persist
+ StatisticsRepository->>BE: write
+ AnalysisManager->>AnalysisTaskScheduler: schedule
+ AnalysisTaskScheduler->>AnalysisTaskExecutor: notify
+ AnalysisTaskExecutor->>AnalysisTaskScheduler: getPendingTasks
+ AnalysisTaskExecutor->>ThreadPoolExecutor: submit(AnalysisTaskWrapper)
+ ThreadPoolExecutor->>AnalysisTaskWrapper: run
+ AnalysisTaskWrapper->>BE: collect && write
+ AnalysisTaskWrapper->>StatisticCache: refresh
+ AnalysisTaskWrapper->>AnalysisManager: updateTaskStatus
+ alt is all task finished
+ AnalysisManager->> StatisticsUtil: execUpdate mark job finished
+ StatisticsUtil->> BE: update job status
+ end
+end
+
+```
+## Load execution flow
+
+```mermaid
+sequenceDiagram
+StatsCalculator->>StatisticCache: get
+alt is cached
+ StatisticCache->>StatsCalculator: return cached stats
+else not cached
+ StatisticCache->>StatsCalculator: return UNKNOWN stats
+ StatisticCache->>ThreadPoolExecutor: submit load task
+ ThreadPoolExecutor->>AsyncTask: get
+ AsyncTask->>StatisticsUtil: execStatisticQuery
+ alt exception occurred:
+ AsyncTask->>StatisticCache: return UNKNOWN stats
+ StatisticCache->> StatisticCache: cache UNKNOWN for the column
+ else no exception:
+ StatisticsUtil->>AsyncTask: Return results rows
+ AsyncTask->>StatisticsUtil: deserializeToColumnStatistics(result
rows)
+ alt exception occurred:
+ AsyncTask->>StatisticCache: return UNKNOWN stats
+ StatisticCache->> StatisticCache: cache UNKNOWN for the column
+ else no exception:
+ StatisticCache->> StatisticCache: cache normal stats
+ end
+ end
+
+end
+```
+
+# Configure options
+
+# User interface
+
+# Test
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]