This is an automated email from the ASF dual-hosted git repository.
kassiez pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new edefa48ac9 [Improvement](docs) refine query acceleration doc (#1586)
edefa48ac9 is described below
commit edefa48ac97832385c1ef575a9e3359caabacd41
Author: xzj7019 <[email protected]>
AuthorDate: Wed Dec 25 10:37:04 2024 +0800
[Improvement](docs) refine query acceleration doc (#1586)
## Versions
- [x] dev
- [ ] 3.0
- [ ] 2.1
- [ ] 2.0
## Languages
- [x] Chinese
- [x] English
## Docs Checklist
- [x] Checked by AI
- [ ] Test Cases Built
---
.../performance-tuning-overview/analysis-tools.md | 147 +++++++++++++++++++++
.../diagnostic-tools.md | 107 +++++++++++++++
.../performance-tuning-overview/tuning-overview.md | 36 +++++
.../performance-tuning-overview/tuning-process.md | 76 +++++++++++
.../tuning/tuning-plan/optimizing-table-index.md | 6 +-
.../tuning-plan/optimizing-table-scanning.md | 7 +-
.../tuning/tuning-plan/optimizing-table-schema.md | 21 +--
.../performance-tuning-overview/analysis-tools.md | 16 +--
.../diagnostic-tools.md | 2 -
.../performance-tuning-overview/tuning-process.md | 4 +-
.../accelerating-queries-with-sql-cache.md | 3 +-
.../transparent-rewriting-with-sync-mv.md | 1 -
12 files changed, 396 insertions(+), 30 deletions(-)
diff --git
a/docs/query-acceleration/performance-tuning-overview/analysis-tools.md
b/docs/query-acceleration/performance-tuning-overview/analysis-tools.md
index e69de29bb2..c921000ab8 100644
--- a/docs/query-acceleration/performance-tuning-overview/analysis-tools.md
+++ b/docs/query-acceleration/performance-tuning-overview/analysis-tools.md
@@ -0,0 +1,147 @@
+---
+{
+ "title": "Analysis Tools",
+ "language": "en"
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## Overview
+
+The previous section on [diagnostic tools](diagnostic-tools.md) helped
business and operations personnel pinpoint specific slow SQL queries. This
section will introduce how to analyze the performance bottlenecks of slow SQL
to determine which part of the SQL execution process is causing the slowdown.
Powerful analysis tools are the foundation and guarantee for locating
performance issues.
+
+The execution process of an SQL query can be roughly divided into two stages:
plan generation and plan execution. The former is responsible for generating
the execution plan, while the latter executes the specific plan. Issues in
either part can lead to performance bottlenecks. For example, if a poor plan is
generated, no matter how excellent the executor is, good performance cannot be
achieved. Similarly, even with a correct plan, inappropriate execution methods
can also lead to perform [...]
+
+All three types of problems require the support of good analysis tools. Based
on this, the Doris system provides two performance analysis tools to analyze
bottlenecks in planning and execution respectively. Additionally, the system
level also offers corresponding performance monitoring tools to assist in
locating performance bottlenecks. The following sections will introduce these
three aspects:
+
+## Doris Explain
+
+An execution plan describes the specific execution method and process of an
SQL query. For example, for an SQL query that joins two tables, the execution
plan will show information such as how the tables are accessed, the join
method, and the join order.
+
+Doris provides the Explain tool, which conveniently displays detailed
information about an SQL query's execution plan. By analyzing the plan output
by Explain, users can quickly locate bottlenecks at the planning level and
perform plan-level tuning based on different situations.
+
+Doris offers multiple Explain tools with different levels of granularity, such
as Explain Verbose, Explain All Plan, Explain Memo Plan, and Explain Shape
Plan, which are used to display the final physical plan, logical plans at
various stages, plans based on cost optimization processes, and plan shapes,
respectively. For detailed information, please refer to the Execution Plan
Explain section to learn about the usage of various Explain tools and the
interpretation of their output information.
+
+By analyzing the output of Explain, business personnel and DBAs can quickly
locate performance bottlenecks in the current plan. For example, by analyzing
the execution plan, it may be discovered that filters are not pushed down to
the base tables, resulting in data not being filtered early and an excessive
amount of data being involved in calculations, leading to performance issues.
Another example is that in an Inner equi-join of two tables, the filter
conditions on one side of the join [...]
+
+For cases of using Doris Explain output to perform plan-level tuning, please
refer to the [Plan Tuning](../tuning/tuning-plan/optimizing-table-schema.md)
section.
+
+## Doris Profile
+
+The Explain tool described above outlines the execution plan for an SQL query,
such as planning a join operation between tables t1 and t2 as a Hash Join, with
t1 designated as the build side and t2 as the probe side. When the SQL query is
actually executed, understanding how much time each specific execution step
takes—for instance, how long the build phase lasts and how long the probe phase
lasts—is crucial for performance analysis and tuning. The Profile tool provides
detailed executio [...]
+
+## Profile File Structure
+
+A Profile file contains several main sections:
+
+1. Basic query information: including ID, time, database, etc.
+2. The SQL statement and its execution plan.
+3. Time spent by the Frontend (FE) on tasks like Plan Time, Schedule Time, etc.
+4. Execution time spent by each operator during the Backend (BE) processing
(including Merged Profile and Execution Profile).
+
+5. The detailed information about the execution side is mainly contained in
the last part. Next, we will mainly introduce what information the Profile can
provide for performance analysis.
+
+## Merged Profile
+
+To help users more accurately analyze performance bottlenecks, Doris provides
aggregated profile results for each operator. Taking the EXCHANGE_OPERATOR as
an example:
+
+```sql
+EXCHANGE_OPERATOR (id=4):
+ - BlocksProduced: sum 0, avg 0, max 0, min 0
+ - CloseTime: avg 34.133us, max 38.287us, min 29.979us
+ - ExecTime: avg 700.357us, max 706.351us, min 694.364us
+ - InitTime: avg 648.104us, max 648.604us, min 647.605us
+ - MemoryUsage: sum , avg , max , min
+ - PeakMemoryUsage: sum 0.00 , avg 0.00 , max 0.00 , min 0.00
+ - OpenTime: avg 4.541us, max 5.943us, min 3.139us
+ - ProjectionTime: avg 0ns, max 0ns, min 0ns
+ - RowsProduced: sum 0, avg 0, max 0, min 0
+ - WaitForDependencyTime: avg 0ns, max 0ns, min 0ns
+ - WaitForData0: avg 9.434ms, max 9.476ms, min 9.391ms
+```
+
+The Merged Profile consolidates key metrics for each operator, with the core
metrics and their meanings outlined below:
+
+| Metric Name | Metric Definition
|
+| ---------------------
|------------------------------------------------------------|
+| BlocksProduced | Number of Data Blocks produced
|
+| CloseTime | Time spent by the Operator during the close phase
|
+| ExecTime | Total execution time of the Operator across all
phases |
+| InitTime | Time spent by the Operator during the initialization
phase |
+| MemoryUsage | Memory usage of the Operator during execution
|
+| OpenTime | Time spent by the Operator during the open phase
|
+| ProjectionTime | Time spent by the Operator on projections
|
+| RowsProduced | Number of rows returned by the Operator
|
+| WaitForDependencyTime | Time the Operator waits for its execution
dependencies |
+
+In Doris, each operator executes concurrently based on the concurrency level
set by the user. Therefore, the Merged Profile calculates the Max, Avg, and Min
values for each metric across all concurrent executions.
+
+WaitForDependencyTime varies for each Operator, as the execution dependencies
differ. For instance, in the case of an EXCHANGE_OPERATOR, the dependency is on
data being sent by upstream operators via RPC. Thus, WaitForDependencyTime in
this context specifically refers to the time spent waiting for upstream
operators to send data.
+
+## Execution Profile
+
+Unlike the Merged Profile, the Execution Profile displays detailed metrics for
a specific concurrent execution. Taking the exchange operator with id=4 as an
example:
+
+```sql
+EXCHANGE_OPERATOR (id=4):(ExecTime: 706.351us)
+ - BlocksProduced: 0
+ - CloseTime: 38.287us
+ - DataArrivalWaitTime: 0ns
+ - DecompressBytes: 0.00
+ - DecompressTime: 0ns
+ - DeserializeRowBatchTimer: 0ns
+ - ExecTime: 706.351us
+ - FirstBatchArrivalWaitTime: 0ns
+ - InitTime: 647.605us
+ - LocalBytesReceived: 0.00
+ - MemoryUsage:
+ - PeakMemoryUsage: 0.00
+ - OpenTime: 5.943us
+ - ProjectionTime: 0ns
+ - RemoteBytesReceived: 0.00
+ - RowsProduced: 0
+ - SendersBlockedTotalTimer(*): 0ns
+ - WaitForDependencyTime: 0ns
+ - WaitForData0: 9.476ms
+```
+
+In this profile, for instance, LocalBytesReceived is a metric specific to the
exchange operator and not found in other operators, hence it is not included in
the Merged Profile.
+
+## PipelineTask Execution Time
+
+In Doris, a PipelineTask consists of multiple operators. When analyzing the
execution time of a PipelineTask, several key aspects need to be focused on:
+
+1. ExecuteTime: 1.656ms (The actual execution time of the entire PipelineTask,
which is approximately the sum of the ExecTime of all operators within the
task).
+2. WaitWorkerTime: 63.868us (The time the task waits for an execution worker.
When the task is in a runnable state, it waits for an available worker to
execute it, and this duration primarily depends on the cluster load).
+3. Time Waiting for Execution Dependencies: 10.495ms (WaitForBroadcastBuffer +
WaitForRpcBufferQueue +
WaitForDependency[AGGREGATION_OPERATOR_DEPENDENCY]Time). The time a task waits
for execution dependencies is the sum of the waiting times for these
dependencies.
+
+For cases of using Profile for execution-level tuning, please refer to the
[Tuning
Execution](../tuning/tuning-execution/adjustment-of-runtimefilter-wait-time.md)
section.
+
+## System-Level Performance Tools
+
+Commonly used system tools can assist in identifying performance bottlenecks
during execution. For instance, widely used Linux tools such as top, free,
perf, sar, and iostats can be utilized to observe the CPU, memory, I/O, and
network status of the system while SQL is running, thereby aiding in the
identification of performance bottlenecks.
+
+## Summary
+
+Effective performance analysis tools are crucial for quickly identifying
performance bottlenecks. Doris provides Explain and Profile, offering powerful
support for analyzing issues with execution plans and identifying which
operations consume the most time during execution. Additionally, proficient use
of system-level analysis tools can greatly assist in locating performance
bottlenecks.
+
+
+
diff --git
a/docs/query-acceleration/performance-tuning-overview/diagnostic-tools.md
b/docs/query-acceleration/performance-tuning-overview/diagnostic-tools.md
index e69de29bb2..683bf8135c 100644
--- a/docs/query-acceleration/performance-tuning-overview/diagnostic-tools.md
+++ b/docs/query-acceleration/performance-tuning-overview/diagnostic-tools.md
@@ -0,0 +1,107 @@
+---
+{
+ "title": "Diagnostic Tools",
+ "language": "en"
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## Overview
+
+Efficient and effective performance diagnostic tools are crucial for database
system tuning, as they determine whether problematic business SQL queries can
be quickly identified, and subsequently, performance bottlenecks can be rapidly
pinpointed and resolved, ensuring that the database system meets its Service
Level Agreements (SLAs).
+
+Currently, Doris considers SQL queries with execution times exceeding 5
seconds as slow SQL by default. This threshold can be configured via
`config.qe_slow_log_ms`. Doris currently offers the following three diagnostic
channels to help quickly identify slow SQL queries with performance issues:
+
+## Doris Manager Logs
+
+The log module in Doris Manager provides a slow SQL filtering function. Users
can view slow SQL by selecting the `fe.audit.log` on a specific FE node. By
simply entering `slow_query` in the search box, the historical slow SQL
information of the current system will be displayed on the page, as shown in
the figure below:
+
+
+
+## Audit Log
+
+Currently, Doris FE provides four types of Audit Logs, including `slow_query`,
`query`, `load`, and `stream_load`. Besides accessing the logs through the log
page on the cluster where the Manager service is installed and deployed, Audit
Logs can also be directly obtained by accessing the `fe/log/fe.audit.log` file
on the node where FE is located.
+
+By directly searching for the `slow_query` tag in `fe.audit.log`, you can
quickly filter out slow-executing SQL queries, as shown below:
+
+```sql
+2024-07-18 11:23:13,042 [slow_query]
|Client=127.0.0.1:63510|User=root|Ctl=internal|Db=tpch_sf1000|State=EOF|ErrorCode=0|ErrorMessage=|Time(ms)=11603|ScanBytes=236667379712|ScanRows=13649979418|ReturnRows=100|StmtId=1689|QueryId=91ff336304f14182-9ca537eee75b3856|IsQuery=true|isNereids=true|feIp=172.21.0.10|Stmt=select
c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice,
sum(l_quantity) from customer, orders, lineitem where
o_orderkey in ( [...]
+2024-07-18 11:23:33,043 [slow_query]
|Client=127.0.0.1:26672|User=root|Ctl=internal|Db=tpch_sf1000|State=EOF|ErrorCode=0|ErrorMessage=|Time(ms)=8978|ScanBytes=334985555968|ScanRows=10717654374|ReturnRows=100|StmtId=1815|QueryId=6e1fae453cb04d9a-b1e5f94d9cea1885|IsQuery=true|isNereids=true|feIp=172.21.0.10|Stmt=select
s_name, count(*) as numwait from supplier, lineitem l1,
orders, nation where s_suppkey = l1.l_suppkey and o_orderkey =
l1.l_orderkey and [...]
+2024-07-18 11:23:41,044 [slow_query]
|Client=127.0.0.1:26684|User=root|Ctl=internal|Db=tpch_sf1000|State=EOF|ErrorCode=0|ErrorMessage=|Time(ms)=8514|ScanBytes=334986551296|ScanRows=10717654374|ReturnRows=100|StmtId=1833|QueryId=4f91483464ce4aa8-beeed7dcb8675bc8|IsQuery=true|isNereids=true|feIp=172.21.0.10|Stmt=select
s_name, count(*) as numwait from supplier, lineitem l1,
orders, nation where s_suppkey = l1.l_suppkey and o_orderkey =
l1.l_orderkey and [...]
+2024-07-18 11:23:49,044 [slow_query]
|Client=127.0.0.1:10748|User=root|Ctl=internal|Db=tpch_sf1000|State=EOF|ErrorCode=0|ErrorMessage=|Time(ms)=8660|ScanBytes=334987673600|ScanRows=10717654374|ReturnRows=100|StmtId=1851|QueryId=4599cb1bab204f80-ac430dd78b45e3da|IsQuery=true|isNereids=true|feIp=172.21.0.10|Stmt=select
s_name, count(*) as numwait from supplier, lineitem l1,
orders, nation where s_suppkey = l1.l_suppkey and o_orderkey =
l1.l_orderkey and [...]
+```
+
+The slow SQL obtained through `fe.audit.log` allows users to easily access
detailed information such as execution time, number of rows scanned, number of
rows returned, and the SQL statement itself, laying the foundation for further
reproducing and locating performance issues.
+
+## audit_log System Table
+
+Starting from Doris version 2.1, the `audit_log` system table is provided
under the `__internal_schema` database for users to view the execution status
of SQL queries. Before using it, the global configuration `set global
enable_audit_plugin=true`; needs to be enabled (this switch is disabled by
default).
+
+```sql
+mysql> use __internal_schema;
+Reading table information for completion of table and column names
+You can turn off this feature to get a quicker startup with -A
+
+Database changed
+mysql> show tables;
++-----------------------------+
+| Tables_in___internal_schema |
++-----------------------------+
+| audit_log |
+| column_statistics |
+| histogram_statistics |
+| partition_statistics |
++-----------------------------+
+4 rows in set (0.00 sec)
+
+mysql> desc audit_log;
++-------------------+--------------+------+-------+---------+-------+
+| Field | Type | Null | Key | Default | Extra |
++-------------------+--------------+------+-------+---------+-------+
+| query_id | varchar(48) | Yes | true | NULL | |
+| time | datetime | Yes | true | NULL | |
+| client_ip | varchar(128) | Yes | true | NULL | |
+| user | varchar(128) | Yes | false | NULL | NONE |
+| catalog | varchar(128) | Yes | false | NULL | NONE |
+| db | varchar(128) | Yes | false | NULL | NONE |
+| state | varchar(128) | Yes | false | NULL | NONE |
+| error_code | int | Yes | false | NULL | NONE |
+| error_message | text | Yes | false | NULL | NONE |
+| query_time | bigint | Yes | false | NULL | NONE |
+| scan_bytes | bigint | Yes | false | NULL | NONE |
+| scan_rows | bigint | Yes | false | NULL | NONE |
+| return_rows | bigint | Yes | false | NULL | NONE |
+| stmt_id | bigint | Yes | false | NULL | NONE |
+| is_query | tinyint | Yes | false | NULL | NONE |
+| frontend_ip | varchar(128) | Yes | false | NULL | NONE |
+| cpu_time_ms | bigint | Yes | false | NULL | NONE |
+| sql_hash | varchar(128) | Yes | false | NULL | NONE |
+| sql_digest | varchar(128) | Yes | false | NULL | NONE |
+| peak_memory_bytes | bigint | Yes | false | NULL | NONE |
+| stmt | text | Yes | false | NULL | NONE |
++-------------------+--------------+------+-------+---------+-------+
+```
+
+Through the `audit_log` internal table, users can query detailed SQL execution
information and perform detailed statistical analysis such as slow query
filtering.
+
+## Summary
+
+Doris Manager logs, audit logs, and the `audit_log` system table provide
capabilities such as automatic or manual filtering of slow SQL queries, as well
as fine-grained statistical analysis of SQL execution information. These tools
offer powerful support for systematic performance diagnosis and tuning.
\ No newline at end of file
diff --git
a/docs/query-acceleration/performance-tuning-overview/tuning-overview.md
b/docs/query-acceleration/performance-tuning-overview/tuning-overview.md
index e69de29bb2..83ec5131e3 100644
--- a/docs/query-acceleration/performance-tuning-overview/tuning-overview.md
+++ b/docs/query-acceleration/performance-tuning-overview/tuning-overview.md
@@ -0,0 +1,36 @@
+---
+{
+ "title": "Tuning Overview",
+ "language": "en"
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+Query performance tuning is a systematic process that requires multi-level and
multi-dimensional adjustments to the database system. Below is an overview of
the tuning process and methodology:
+
+1. Firstly, business personnel and database administrators (DBAs) need to have
a comprehensive understanding of the database system being used, including the
hardware utilized by the business system, the scale of the cluster, the version
of the database software being used, as well as the features provided by the
specific software version.
+2. Secondly, an effective performance diagnostic tool is a necessary
prerequisite for identifying performance issues. Only by efficiently and
quickly locating problematic SQL queries or slow SQL queries can subsequent
specific performance tuning processes be carried out.
+3. After entering the performance tuning phase, a range of commonly used
performance analysis tools are indispensable. These include specialized tools
provided by the currently running database system, as well as general tools at
the operating system level.
+4. With these tools in place, specialized tools can be used to obtain detailed
information about SQL queries running on the current database system, aiding in
the identification of performance bottlenecks. Meanwhile, general tools can
serve as auxiliary analysis methods to assist in locating issues.
+
+In summary, performance tuning requires evaluating the current system's
performance status from a holistic perspective. Firstly, it is necessary to
identify business SQL queries with performance issues, then utilize analysis
tools to discover performance bottlenecks, and finally implement specific
tuning operations.
+
+Based on the aforementioned tuning process and methodology, Apache Doris
provides corresponding tools at each of these levels. The following sections
will introduce the performance [diagnostic tools](diagnostic-tools.md),
[analysis tools](analysis-tools.md), and [tuning process](tuning-process.md)
respectively.
\ No newline at end of file
diff --git
a/docs/query-acceleration/performance-tuning-overview/tuning-process.md
b/docs/query-acceleration/performance-tuning-overview/tuning-process.md
index e69de29bb2..b8dce7369b 100644
--- a/docs/query-acceleration/performance-tuning-overview/tuning-process.md
+++ b/docs/query-acceleration/performance-tuning-overview/tuning-process.md
@@ -0,0 +1,76 @@
+---
+{
+"title": "Tuning Process",
+"language": "en"
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## Overview
+
+Performance tuning is a systematic process that requires a comprehensive
methodology and implementation framework for systematic diagnosis and
optimization. With the strong support of [diagnostic
tools](diagnostic-tools.md) and [analysis tools](analysis-tools.md), the Doris
system can efficiently diagnose, analyze, locate, and resolve performance
issues. The complete four-step process for tuning is as follows:
+
+
+
+## Step 1: Use Performance Diagnostic Tools to Identify Slow Queries
+
+For business systems running on Doris, use the aforementioned [performance
diagnostic tools](diagnostic-tools.md) to identify slow SQL queries.
+
+- If Doris Manager is installed, it is recommended to use the Manager's log
page for convenient visual identification of slow queries.
+- If Manager is not installed, you can directly check the `fe.audit.log` file
on the FE node or the audit_log system table to obtain a list of slow SQL
queries and prioritize them for tuning.
+
+## Step 2: Schema Design and Tuning
+
+After identifying specific slow SQL queries, the first priority is to inspect
and tune the business schema design to eliminate performance issues caused by
unreasonable schema design.
+
+Schema design tuning can be divided into three aspects:
+
+- [Table-level Schema Design
Tuning](../tuning/tuning-plan/optimizing-table-schema.md), such as adjusting
the number of partitions and buckets, and field optimization;
+- [Index Design and Tuning](../tuning/tuning-plan/optimizing-table-index.md)
+- The use of specific optimization techniques, such as [Optimizing Join with
Colocate Group](../tuning/tuning-plan/optimizing-join-with-colocate-group.md).
The main goal is to eliminate performance issues caused by unreasonable schema
design or failure to fully leverage Doris's existing optimization capabilities.
+
+For detailed tuning examples, please refer to the documentation on [Plan
Tuning](../tuning/tuning-plan/optimizing-table-schema.md).
+
+## Step 3: Plan Tuning
+
+After inspecting and tuning the business schema, the main task of tuning
begins: plan tuning and execution tuning. As mentioned above, at this stage,
the primary task is to make full use of the various levels of Explain tools
provided by Doris to systematically analyze the execution plans of slow SQL
queries and identify key optimization points for targeted optimization.
+
+- For single-table query and analysis scenarios, you can analyze the execution
plan to check if [partition
pruning](../tuning/tuning-plan/optimizing-table-scanning.md) is working
properly and [use single-table materialized views for query
acceleration](../tuning/tuning-plan/transparent-rewriting-with-sync-mv.md).
+- For complex multi-table analysis scenarios, you can analyze the Join Order
to determine if it is reasonable and identify specific performance bottlenecks.
You can also [use multi-table materialized views for transparent rewriting to
accelerate
queries](../tuning/tuning-plan/transparent-rewriting-with-async-mv.md). If
unexpected situations occur, such as unreasonable Join Order, you can manually
specify the Join Hint to bind the execution plan, such as [using the Leading
hint to control [...]
+- For specific scenarios, you can also leverage advanced features provided by
Doris, such as [using SQL Cache to accelerate
queries](../tuning/tuning-plan/accelerating-queries-with-sql-cache.md).
+
+For detailed tuning examples, please refer to the documentation on [Plan
Tuning](../tuning/tuning-plan/optimizing-table-schema.md).
+
+## Step 4: Execution Tuning
+
+In the execution tuning stage, you need to validate the effectiveness of plan
tuning based on the actual execution of SQL queries. Additionally, within the
framework of the existing plan, continue to analyze bottlenecks on the
execution side, identify which execution stages are slow, or other common
issues such as suboptimal parallelism.
+
+Taking multi-table analysis queries as an example, you can analyze the Profile
to check if the planned Join order is reasonable, if Runtime Filters are
effective, and if the parallelism meets expectations. Furthermore, the Profile
can provide feedback on machine load, such as slow I/O or unexpected network
transmission performance. When confirming and diagnosing such issues,
system-level tools are needed to assist in diagnosis and tuning.
+
+For detailed tuning examples, please refer to the documentation on [Execution
Tuning](../tuning/tuning-execution/adjustment-of-runtimefilter-wait-time.md).
+
+:::tip
+When analyzing specific performance issues, it is recommended to first check
the plan and then tune the execution. Start by using the Explain tool to
confirm the execution plan, and then use the Profile tool to locate and tune
execution performance. Reversing the order may lead to inefficiencies and
hinder the rapid identification of performance issues.
+:::
+
+## Summary
+
+Query tuning is a systematic process, and Doris provides users with tools
across various dimensions to facilitate the diagnosis, identification,
analysis, and resolution of performance issues at different levels. By
familiarizing themselves with these diagnostic and analysis tools and adopting
reasonable tuning methods, business personnel and DBAs can quickly and
effectively address performance bottlenecks, better unleash Doris's powerful
performance advantages, and better adapt to busin [...]
diff --git
a/docs/query-acceleration/tuning/tuning-plan/optimizing-table-index.md
b/docs/query-acceleration/tuning/tuning-plan/optimizing-table-index.md
index 6e5af4544f..4748a908e1 100644
--- a/docs/query-acceleration/tuning/tuning-plan/optimizing-table-index.md
+++ b/docs/query-acceleration/tuning/tuning-plan/optimizing-table-index.md
@@ -24,20 +24,20 @@ specific language governing permissions and limitations
under the License.
-->
+## Overview
Doris currently supports two types of indexes:
1. Built-in Indexes: These include prefix indexes, ZoneMap indexes, etc.
-
2. Secondary Indexes: These include inverted indexes, Bloom filter indexes,
N-Gram Bloom filter indexes, and Bitmap indexes, etc.
In the process of business optimization, fully analyzing business
characteristics and make effective use of indexes can greatly enhance the
effectiveness of queries and analyses, thereby achieving the purpose of
performance tuning.
-For a detailed introduction to various indexes, please refer to the [Table
Index](../../../table-design/index/index-overview) section. This chapter will
demonstrate index usage techniques in several typical scenarios from the
perspective of actual cases and summarize optimization suggestions for
reference in business tuning.
+For a detailed introduction to various indexes, please refer to the [Table
Index](../../../table-design/index/index-overview.md) section. This chapter
will demonstrate index usage techniques in several typical scenarios from the
perspective of actual cases and summarize optimization suggestions for
reference in business tuning.
## Case 1: Optimizing the Order of Key Columns to Leverage Prefix Indexes for
Accelerated Queries
-In optimizing table schema design, we have introduced how to select
appropriate fields as key fields and utilize Doris's key column sorting feature
to accelerate queries. This case will further expand on this scenario.
+In [optimizing table schema design](optimizing-table-schema.md), we have
introduced how to select appropriate fields as key fields and utilize Doris's
key column sorting feature to accelerate queries. This case will further expand
on this scenario.
Due to Doris's built-in prefix index function, it automatically takes the
first 36 bytes of the table's Key as a prefix index when creating the table.
When query conditions match the prefix of the prefix index, it can
significantly speed up the query. Below is an example of a table definition:
diff --git
a/docs/query-acceleration/tuning/tuning-plan/optimizing-table-scanning.md
b/docs/query-acceleration/tuning/tuning-plan/optimizing-table-scanning.md
index c97c91ffdd..9d42bbb9f4 100644
--- a/docs/query-acceleration/tuning/tuning-plan/optimizing-table-scanning.md
+++ b/docs/query-acceleration/tuning/tuning-plan/optimizing-table-scanning.md
@@ -24,13 +24,13 @@ specific language governing permissions and limitations
under the License.
-->
-## Principle
+## Overview
Doris, as a high-performance real-time analytics data warehouse, offers a
powerful partition pruning feature that can significantly enhance query
performance.
Partition pruning is a query optimization technique that intelligently
identifies partitions relevant to a query by analyzing its conditions, and
scans only the data within these partitions, thereby avoiding unnecessary scans
of irrelevant partitions. This approach can greatly reduce I/O operations and
computational load, thus accelerating query execution.
-## Use Case
+## Case
Here is a usage case to demonstrate Doris's partition pruning feature.
@@ -66,11 +66,8 @@ WHERE date BETWEEN '2023-01-15' AND '2023-02-15';
For the above query, Doris's partition pruning optimization process is as
follows:
1. Doris intelligently analyzes the partition column `date` in the query
conditions and identifies the date range of the query as being between
'2023-01-15' and '2023-02-15'.
-
2. By comparing the query conditions with the partition definitions, Doris
precisely locates the range of partitions that need to be scanned. In this
example, only partitions `p2` and `p3` need to be scanned, as their date ranges
fully cover the query conditions.
-
3. Doris automatically skips partitions unrelated to the query conditions,
such as `p1` and `p4`, avoiding unnecessary data scans and thereby reducing I/O
overhead.
-
4. Finally, Doris performs data scanning and aggregation computations only
within partitions `p2` and `p3`, quickly obtaining the query results.
By using the `EXPLAIN` command, we can view the query execution plan and
confirm that Doris's partition pruning optimization has taken effect. In the
execution plan, the `partition` attribute of the `OlapScanNode` node will
display the actually scanned partitions as `p2` and `p3`.
diff --git
a/docs/query-acceleration/tuning/tuning-plan/optimizing-table-schema.md
b/docs/query-acceleration/tuning/tuning-plan/optimizing-table-schema.md
index 65878ce7df..fac7e8859b 100644
--- a/docs/query-acceleration/tuning/tuning-plan/optimizing-table-schema.md
+++ b/docs/query-acceleration/tuning/tuning-plan/optimizing-table-schema.md
@@ -24,9 +24,11 @@ specific language governing permissions and limitations
under the License.
-->
+## Overview
+
In Schema design and tuning, table Schema design is a crucial part,
encompassing table engine selection, partition and bucket column selection,
partition and bucket size settings, key column and field type optimization,
etc. Systems lacking proper Schema design may encounter issues such as data
skew, failing to fully leverage system parallelism and sorting features,
thereby hindering the Doris system from realizing its true performance
advantages within business systems.
-Detailed design principles can be found in the [Data Table
Design](../../../table-design/overview) section for further information. This
chapter, from the perspective of practical cases, will showcase performance
bottlenecks caused by Schema design issues in several typical scenarios and
provide optimization suggestions for business tuning reference.
+Detailed design principles can be found in the [Data Table
Design](../../../table-design/overview.md) section for further information.
This chapter, from the perspective of practical cases, will showcase
performance bottlenecks caused by Schema design issues in several typical
scenarios and provide optimization suggestions for business tuning reference.
## Case 1: Table Engine Selection
@@ -36,7 +38,7 @@ The query performance of these table models, from best to
worst, is: Duplicate >
:::tip
-When the business has no data update requirements but high demands for query
performance, the [Duplicate table](../../../table-design/data-model/duplicate)
is recommended.
+When the business has no data update requirements but high demands for query
performance, the [Duplicate
table](../../../table-design/data-model/duplicate.md) is recommended.
:::
@@ -69,12 +71,14 @@ Therefore, during the Schema design phase, business
personnel need to design rea
select c2,count(*) cnt from t1 group by c2 order by cnt desc limit 10;
```
-It is clear that good prior design can significantly reduce the cost of
locating and correcting issues when they occur. Therefore, it is strongly
recommended that business personnel conduct rigorous design and checks during
the Schema design phase to avoid introducing unnecessary costs.
-
:::tip
+
Check whether the bucket column has data skew issues. If so, replace it with a
field that has adequate hashing characteristics in business meaning as the
bucket column.
+
:::
+It is clear that good prior design can significantly reduce the cost of
locating and correcting issues when they occur. Therefore, it is strongly
recommended that business personnel conduct rigorous design and checks during
the Schema design phase to avoid introducing unnecessary costs.
+
## Case 3: Key Column Optimization
Among the three table models, if the table creation Schema explicitly
specifies a Duplicate Key, Unique Key, or Aggregate Key, Doris will ensure that
data is sorted based on the Key column at the storage level. This feature
provides new ideas for data query performance optimization. Specifically,
during the Schema design phase, if columns frequently used for equality or
range queries in business queries can be defined as Key columns, it will
significantly increase the execution speed of [...]
@@ -102,7 +106,9 @@ PROPERTIES (
```
:::tip
+
Set columns frequently used in business queries as key columns to accelerate
the query process.
+
:::
## Case 4: Field Type Optimization
@@ -112,15 +118,14 @@ In database systems, the complexity of processing
different types of data can va
This characteristic provides important insights into the design and later
optimization of business system Schemas:
1. While meeting the expression and computation needs of business systems,
priority should be given to fixed-length types, avoiding the use of
variable-length types;
-
2. At the same time, low-precision types should be adopted instead of
high-precision types. Specific practices include using BIGINT to replace
VARCHAR or STRING type fields and using FLOAT / INT / BIGINT to replace DECIMAL
type fields. Reasonable design and optimization of such field types will
greatly enhance business computation efficiency, thereby improving system
performance.
:::tip
+
When defining Schema types, follow the principle of prioritizing fixed-length
and low-precision types.
+
:::
## Summary
-In summary, a well-designed Schema can maximize the utilization of Doris's
features, thereby significantly enhancing business performance. Conversely, a
non-optimized Schema design may have a global negative impact on the business,
such as causing data skew. Therefore, the initial Schema design optimization
work is particularly important.
-
-For performance tuning, you can also refer to using [Colocate Group to
optimize Join](../../../query-data/join#colocate-join). This document will
provide detailed instructions on how to fully leverage Doris's features for
performance optimization, offering strong support for improving your business
performance.
\ No newline at end of file
+In summary, a well-designed Schema can maximize the utilization of Doris's
features, thereby significantly enhancing business performance. Conversely, a
non-optimized Schema design may have a global negative impact on the business,
such as causing data skew. Therefore, the initial Schema design optimization
work is particularly important.
\ No newline at end of file
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/performance-tuning-overview/analysis-tools.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/performance-tuning-overview/analysis-tools.md
index f8142b5582..fcbfba99fa 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/performance-tuning-overview/analysis-tools.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/performance-tuning-overview/analysis-tools.md
@@ -46,7 +46,7 @@ Doris 提供了多种不同粒度的 Explain 工具,如 Explain Verbose、Expl
## Doris Profile
-上述 Explain 工具描述了一条 SQL 的执行的规划,比如一个 t1 和 t2 表的连接操作被规划成了 Hash Join 的执行方式,并且 t1
表被规划在 build 侧,t2 表被规划在 probe 侧。当 SQL 具体执行时,如何了解每个具体的执行分别耗费多少时间,比如 build
耗费多少时间,probe 耗费多少时间,profile 工具提供了详细的执行信息供性能分析和调优使用。下面部分先整体介绍 Profile
的文件结构,然后分别介绍 Merged Profile,Exection Profile 以及 PipelineTask 的执行时间含义:
+上述 Explain 工具描述了一条 SQL 的执行的规划,比如一个 t1 和 t2 表的连接操作被规划成了 Hash Join 的执行方式,并且 t1
表被规划在 build 侧,t2 表被规划在 probe 侧。当 SQL 具体执行时,如何了解每个具体的执行分别耗费多少时间,比如 build
耗费多少时间,probe 耗费多少时间,profile 工具提供了详细的执行信息供性能分析和调优使用。下面部分先整体介绍 Profile
的文件结构,然后分别介绍 Merged Profile,Execution Profile 以及 PipelineTask 的执行时间含义:
### Profile 文件结构
@@ -63,19 +63,19 @@ Profile 文件中包含几个主要的部分:
为了帮助用户更准确的分析性能瓶颈,Doris 提供了各个 operator 聚合后的 profile 结果。以 EXCHANGE_OPERATOR 为例:
-```python
+```sql
EXCHANGE_OPERATOR (id=4):
- BlocksProduced: sum 0, avg 0, max 0, min 0
- CloseTime: avg 34.133us, max 38.287us, min 29.979us
- ExecTime: avg 700.357us, max 706.351us, min 694.364us
- InitTime: avg 648.104us, max 648.604us, min 647.605us
- MemoryUsage: sum , avg , max , min
- - PeakMemoryUsage: sum 0.00 , avg 0.00 , max 0.00 , min
0.00
+ - PeakMemoryUsage: sum 0.00 , avg 0.00 , max 0.00 , min 0.00
- OpenTime: avg 4.541us, max 5.943us, min 3.139us
- ProjectionTime: avg 0ns, max 0ns, min 0ns
- RowsProduced: sum 0, avg 0, max 0, min 0
- WaitForDependencyTime: avg 0ns, max 0ns, min 0ns
- - WaitForData0: avg 9.434ms, max 9.476ms, min 9.391ms
+ - WaitForData0: avg 9.434ms, max 9.476ms, min 9.391ms
```
Merged Profile 对每个 operator 的核心指标做了合并,核心指标和含义包括:
@@ -100,7 +100,7 @@ Doris 中,每个 operator 根据用户设置的并发数并发执行,所以
区别于 Merged Profile,Execution Profile 展示的是具体的某个并发中的详细指标。依以 id=4 的这个 exchange
operator 为例:
-```python
+```sql
EXCHANGE_OPERATOR (id=4):(ExecTime: 706.351us)
- BlocksProduced: 0
- CloseTime: 38.287us
@@ -113,14 +113,14 @@ EXCHANGE_OPERATOR (id=4):(ExecTime: 706.351us)
- InitTime: 647.605us
- LocalBytesReceived: 0.00
- MemoryUsage:
- - PeakMemoryUsage: 0.00
+ - PeakMemoryUsage: 0.00
- OpenTime: 5.943us
- ProjectionTime: 0ns
- RemoteBytesReceived: 0.00
- RowsProduced: 0
- SendersBlockedTotalTimer(*): 0ns
- WaitForDependencyTime: 0ns
- - WaitForData0: 9.476ms
+ - WaitForData0: 9.476ms
```
在这个 profile 中,例如 LocalBytesReceived 是 exchange operator 特化的一个指标,其他的 operator
中没有,所以没在 Merged Profile 中包含。
@@ -133,7 +133,7 @@ EXCHANGE_OPERATOR (id=4):(ExecTime: 706.351us)
2. WaitWorkerTime:task 等待执行 worker 的时间。当 task 处于 runnable 状态时,他要等待一个空闲 worker
来执行,这个耗时主要取决于集群负载。
3. 等待执行依赖的时间:一个 task 可以执行的依赖条件是每个 operator 的 dependency 全部满足执行条件,而 task
等待执行依赖的时间就是将这些依赖的等待时间相加。例如简化这个例子中的其中一个 task:
-```python
+```sql
PipelineTask (index=1):(ExecTime: 4.773ms)
- ExecuteTime: 1.656ms
- CloseTime: 90.402us
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/performance-tuning-overview/diagnostic-tools.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/performance-tuning-overview/diagnostic-tools.md
index 28e663da79..7575f0e350 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/performance-tuning-overview/diagnostic-tools.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/performance-tuning-overview/diagnostic-tools.md
@@ -36,8 +36,6 @@ Doris Manager 的日志模块提供了慢 SQL 筛选功能。用户可以通过

-通过上述监控界面展示的问题时间点,和慢 SQL 中的详细信息如时间点/扫描条数/Shuffle 条数/peakMemory 大小等做交叉验证,确定问题
SQL 的来源。
-
## Audit Log
当前 Doris FE 提供了四种类型的 Audit Log,包括 `slow_query`、`query`、`load` 和
`stream_load`。Audit Log 除了在安装部署 Manager 服务的集群上通过日志页面访问获取之外,也可以直接访问 FE 所在节点的
`fe/log/fe.audit.log` 文件获取信息。
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/performance-tuning-overview/tuning-process.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/performance-tuning-overview/tuning-process.md
index c59eace8af..856c0339f8 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/performance-tuning-overview/tuning-process.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/performance-tuning-overview/tuning-process.md
@@ -45,7 +45,7 @@ Schema 设计调优基本可分为三个方面:
- [表级别 Schema
设计调优](../tuning/tuning-plan/optimizing-table-schema.md),如分区分桶个数和字段调优;
- [索引的设计和调优](../tuning/tuning-plan/optimizing-table-index.md);
-- 特定优化手段的使用,如使用 [Colocate Group 优化
Join](../tuning/tuning-plan/optimizing-join-with-colocate-group.md) 等。主要目的是排除因为
Schema 设计不合理或者没有充分利用 Doris 现有优化能力导致的性能问题。
+- 特定优化手段的使用,如[使用 Colocate Group 优化
Join](../tuning/tuning-plan/optimizing-join-with-colocate-group.md) 等。主要目的是排除因为
Schema 设计不合理或者没有充分利用 Doris 现有优化能力导致的性能问题。
详细调优案例请参考文档 [计划调优](../tuning/tuning-plan/optimizing-table-schema.md)。
@@ -53,7 +53,7 @@ Schema 设计调优基本可分为三个方面:
检查和优化完业务 Schema 后,将进入调优的主体工作,即计划调优与执行调优。如上所述,在性能调优工具中,这个阶段的主要工作是充分利用 Doris
所提供的各种层级的 Explain 工具,对慢 SQL 的执行计划进行系统分析,以找到关键优化点进行针对性优化。
--
针对单表查询和分析场景,可以通过分析执行计划,查看[分区裁剪](../tuning/tuning-plan/optimizing-table-scanning.md)是否正常,使用[单表物化视图进行查询加速](../tuning/tuning-plan/transparent-rewriting-with-sync-mv.md)等。
+-
针对单表查询和分析场景,可以通过分析执行计划,查看[分区裁剪](../tuning/tuning-plan/optimizing-table-scanning.md)是否正常,[使用单表物化视图进行查询加速](../tuning/tuning-plan/transparent-rewriting-with-sync-mv.md)等。
- 针对复杂多表分析场景,可以分析 Join Order
是否合理等定位具体的性能瓶颈,也可以[使用多表物化视图进行透明改写](../tuning/tuning-plan/transparent-rewriting-with-async-mv.md),以加速查询。如果出现非预期的情况,比如
Join Order 不合理,通过观察 Explain 的结果,手工指定 Join Jint 进行执行计划的绑定,如[使用 Leading hint 控制
Join Order](../tuning/tuning-plan/reordering-join-with-leading-hint.md),[使用
Shuffle Hint 调整 Join shuffle
方式](../tuning/tuning-plan/adjusting-join-shuffle.md),[使用 Hint
控制代价改写行为](../tuning/tuning-plan/controlling-hints-with-cbo-rule.md)等,以达到调优执行计划的目的。
- 针对部分特定场景,还可以通过使用 Doris 提供的高级功能,比如[使用 SQL Cache
加速查询](../tuning/tuning-plan/accelerating-queries-with-sql-cache.md)。
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/accelerating-queries-with-sql-cache.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/accelerating-queries-with-sql-cache.md
index 0cdbb3676a..503c85ab42 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/accelerating-queries-with-sql-cache.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/accelerating-queries-with-sql-cache.md
@@ -35,7 +35,8 @@ under the License.
## 总结
SQL Cache 是 Doris 提供的一种查询优化机制,可以显著提升查询性能。在使用的时候需要注意:
-:::tips 提示
+
+:::tip 提示
- SQL Cache 不适用于包含生成随机值的函数 (如 `random()`) 的查询,因为这会导致查询结果失去随机性。
- 目前不支持使用部分指标的缓存结果来满足查询更多指标的需求。例如,之前查询了 2 个指标的缓存不能用于查询 3 个指标的情况。
- 通过合理使用 SQL Cache,可以显著提升 Doris
的查询性能,特别是在数据更新频率较低的场景中。在实际应用中,需要根据具体的数据特征和查询模式来调整缓存参数,以获得最佳的性能提升。
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-sync-mv.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-sync-mv.md
index bb771b1955..cd22191e8a 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-sync-mv.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-sync-mv.md
@@ -42,7 +42,6 @@ under the License.
- 不建议在同一张表上创建多个形态类似的物化视图,因为这可能会导致多个物化视图之间的冲突,从而使查询命中失败。
:::
-3.
## 案例
下面通过一个具体例子来展示使用单表物化视图进行查询加速的流程:
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]