This is an automated email from the ASF dual-hosted git repository. kassiez pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push: new 768e146c71 [docs](typo) Version mgt and fix typo (#1362) 768e146c71 is described below commit 768e146c71fe0eae39a9219317a13f85c7830419 Author: KassieZ <139741991+kass...@users.noreply.github.com> AuthorDate: Tue Nov 19 19:59:47 2024 +0800 [docs](typo) Version mgt and fix typo (#1362) https://github.com/apache/doris-website/pull/1360 https://github.com/apache/doris-website/pull/1359 https://github.com/apache/doris-website/pull/1358 https://github.com/apache/doris-website/pull/1356 https://github.com/apache/doris-website/pull/1355 https://github.com/apache/doris-website/pull/1354 https://github.com/apache/doris-website/pull/1353 https://github.com/apache/doris-website/pull/1270 # Versions - [x] dev - [x] 3.0 - [x] 2.1 - [x] 2.0 # Languages - [x] Chinese - [x] English --- common_docs_zh/gettingStarted/what-is-new.mdx | 2 - community/join-community.md | 44 +++------------------- .../cluster-deployment/k8s-deploy/install-env.md | 4 +- .../tuning/tuning-plan/adjusting-join-shuffle.md | 14 +++++-- .../tuning/tuning-plan/optimizing-table-index.md | 9 +++-- .../solving-unevenly-data-distribution.md | 4 +- .../transparent-rewriting-with-async-mv.md | 4 +- .../transparent-rewriting-with-sync-mv.md | 38 ------------------- gettingStarted/demo-block/latest.tsx | 8 ++-- gettingStarted/what-is-apache-doris.md | 6 +-- .../current/join-community.md | 41 -------------------- .../current/query-acceleration/tuning/overview.md | 16 ++++---- .../tuning/tuning-plan/adjusting-join-shuffle.md | 14 +++++-- .../tuning/tuning-plan/dml-tuning-plan.md | 4 +- .../tuning/tuning-plan/optimizing-table-index.md | 12 +++--- .../tuning/tuning-plan/optimizing-table-schema.md | 4 +- .../reordering-join-with-leading-hint.md | 3 +- .../solving-unevenly-data-distribution.md | 4 +- .../transparent-rewriting-with-async-mv.md | 5 ++- .../transparent-rewriting-with-sync-mv.md | 37 ------------------ .../table-design/index/inverted-index.md | 2 +- .../query-acceleration/tuning/overview.md | 12 +++--- .../tuning/tuning-plan/dml-tuning-plan.md | 4 +- .../tuning/tuning-plan/optimizing-table-index.md | 4 +- .../query-acceleration/tuning/overview.md | 16 ++++---- .../tuning/tuning-plan/adjusting-join-shuffle.md | 14 +++++-- .../tuning/tuning-plan/dml-tuning-plan.md | 4 +- .../tuning/tuning-plan/optimizing-table-index.md | 13 ++++--- .../tuning/tuning-plan/optimizing-table-schema.md | 4 +- .../reordering-join-with-leading-hint.md | 2 +- .../solving-unevenly-data-distribution.md | 4 +- .../transparent-rewriting-with-async-mv.md | 5 ++- .../transparent-rewriting-with-sync-mv.md | 38 ------------------- .../table-design/index/inverted-index.md | 26 ++++++------- .../data-operate/delete/truncate-manual.md | 2 +- .../cluster-deployment/k8s-deploy/install-env.md | 4 +- .../data-operate/delete/truncate-manual.md | 2 +- .../cluster-deployment/k8s-deploy/install-env.md | 4 +- .../tuning/tuning-plan/adjusting-join-shuffle.md | 14 +++++-- .../tuning/tuning-plan/optimizing-table-index.md | 8 ++-- .../solving-unevenly-data-distribution.md | 4 +- .../transparent-rewriting-with-async-mv.md | 4 +- .../transparent-rewriting-with-sync-mv.md | 38 ------------------- .../data-operate/delete/truncate-manual.md | 2 +- .../cluster-deployment/k8s-deploy/install-env.md | 4 +- .../tuning/tuning-plan/adjusting-join-shuffle.md | 14 +++++-- .../tuning/tuning-plan/optimizing-table-index.md | 8 ++-- .../solving-unevenly-data-distribution.md | 4 +- .../transparent-rewriting-with-async-mv.md | 4 +- .../transparent-rewriting-with-sync-mv.md | 38 ------------------- 50 files changed, 179 insertions(+), 396 deletions(-) diff --git a/common_docs_zh/gettingStarted/what-is-new.mdx b/common_docs_zh/gettingStarted/what-is-new.mdx index cc46d4dd79..d72976236e 100644 --- a/common_docs_zh/gettingStarted/what-is-new.mdx +++ b/common_docs_zh/gettingStarted/what-is-new.mdx @@ -31,5 +31,3 @@ import Latest from './demo-block/latest.tsx' - - diff --git a/community/join-community.md b/community/join-community.md index be17e95586..ee6d5dc7ca 100644 --- a/community/join-community.md +++ b/community/join-community.md @@ -31,38 +31,6 @@ We have graduated from Apache incubator successfully and become an Top-Level Pro <hr /> -## 🙌 More Developers Join Us -[![Monthly Active Contributors](https://contributor-overtime-api.apiseven.com/contributors-svg?chart=contributorMonthlyActivity&repo=apache/doris)](https://www.apiseven.com/en/contributor-graph?chart=contributorMonthlyActivity&repo=apache/doris) - - - - - - - - - - - - - -[![Contributor over time](https://contributor-overtime-api.apiseven.com/contributors-svg?chart=contributorOverTime&repo=apache/doris)](https://www.apiseven.com/en/contributor-graph?chart=contributorOverTime&repo=apache/doris) - - - - - - - - - - - - - - - - ## 🌟 More Stars on Github <a href="https://star-history.com/#apache/doris&Date"> @@ -87,25 +55,25 @@ We have graduated from Apache incubator successfully and become an Top-Level Pro -##### We deeply appreciate 🔗[community contributors](https://github.com/apache/doris/graphs/contributors) for their contribution to Apache Doris. +**We deeply appreciate 🔗[community contributors](https://github.com/apache/doris/graphs/contributors) for their contribution to Apache Doris.** <hr /> -# Don't Miss Out the Latest News and Events +## Don't Miss Out the Latest News and Events Learn our latest techniques, get inspirations from our rich use cases, and see what the community has been up to ! -- ### Join our heated discussions - 💬 [Slack](https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-2kl08hzc0-SPJe4VWmL_qzrFd2u2XYQA) 📇 [Github](https://github.com/apache/doris) +- Join our heated discussions - 💬 [Slack](https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-2kl08hzc0-SPJe4VWmL_qzrFd2u2XYQA) 📇 [Github](https://github.com/apache/doris) -- ### Use cases and tech insight - 📭 [Twitter](https://twitter.com/doris_apache) +- Use cases and tech insight - 📭 [Twitter](https://twitter.com/doris_apache) -- ### Come and connect with us - 🌐 [LinkedIn](https://www.linkedin.com/company/doris-apache/) +- Come and connect with us - 🌐 [LinkedIn](https://www.linkedin.com/company/doris-apache/) -- ### Events Videos - ▶️ [YouTube](https://www.youtube.com/@Select_DB) 📺 [Bilibili](https://space.bilibili.com/362350065) +- Events Videos - ▶️ [YouTube](https://www.youtube.com/@Select_DB) 📺 [Bilibili](https://space.bilibili.com/362350065) diff --git a/docs/install/cluster-deployment/k8s-deploy/install-env.md b/docs/install/cluster-deployment/k8s-deploy/install-env.md index 8c16a4a1cb..e9722715bd 100644 --- a/docs/install/cluster-deployment/k8s-deploy/install-env.md +++ b/docs/install/cluster-deployment/k8s-deploy/install-env.md @@ -83,6 +83,6 @@ sysctl -w vm.max_map_count=2000000 When deploying Doris, it is recommended to turn off transparent huge pages. ```shell -echo never > /sys/kernel/mm/transparent_hugepage/enabled -echo never > /sys/kernel/mm/transparent_hugepage/defrag +echo madvise > /sys/kernel/mm/transparent_hugepage/enabled +echo madvise > /sys/kernel/mm/transparent_hugepage/defrag ``` diff --git a/docs/query-acceleration/tuning/tuning-plan/adjusting-join-shuffle.md b/docs/query-acceleration/tuning/tuning-plan/adjusting-join-shuffle.md index da26aa8b43..c235b78047 100644 --- a/docs/query-acceleration/tuning/tuning-plan/adjusting-join-shuffle.md +++ b/docs/query-acceleration/tuning/tuning-plan/adjusting-join-shuffle.md @@ -28,17 +28,19 @@ under the License. Doris supports the use of hints to adjust the type of data shuffle in Join operations, thereby optimizing query performance. This section will introduce in detail how to use hints in Doris to specify the type of Join shuffle. -Currently, Doris is limited to specifying the Distribute Type for the right table in a Join, offering two types to choose from: `[shuffle]` and `[broadcast]`. The Distribute Type should be placed before the right table in the Join, and can be represented using either square brackets `[]` or double slashes `//`. +Currently, Doris is limited to specifying the Distribute Type for the right table in a Join and provides only two types to choose from: `[shuffle]` and `[broadcast]`. The Distribute Type needs to be placed before the right table in the Join, using square brackets `[]`. + Examples are as follows: ```sql SELECT COUNT(*) FROM t2 JOIN [broadcast] t1 ON t1.c1 = t2.c2; -SELECT COUNT(*) FROM t2 JOIN /*+broadcast*/ t1 ON t1.c1 = t2.c2; +SELECT COUNT(*) FROM t2 JOIN [shuffle] t1 ON t1.c1 = t2.c2; SELECT /*+ ordered */ COUNT(*) FROM t2 JOIN [broadcast] t1 ON t1.c1 = t2.c2; -SELECT /*+ ordered */ COUNT(*) FROM t2 JOIN /+broadcast/ t1 ON t1.c1 = t2.c2; +SELECT /*+ ordered */ COUNT(*) FROM t2 JOIN [shuffle] t1 ON t1.c1 = t2.c2; ``` + When using hints, please note the following: 1. If Doris encounters a DistributeHint that cannot correctly generate an execution plan, it will not display the hint but will make it effective according to the "best effort" principle. Ultimately, the distribute method displayed by EXPLAIN shall be deemed as final. @@ -118,4 +120,8 @@ In the EXPLAIN results, you can see the relevant information for the distribute ## Summary -By properly using DistributeHint, you can optimize the shuffle method of Join operations and improve query performance. In practice, it is recommended to first analyze the query execution plan through EXPLAIN and then choose the appropriate shuffle type based on the actual situation. \ No newline at end of file +By reasonably using DistributeHint, you can optimize the Shuffle method for Join operations and enhance query performance. In practice, it is recommended to first analyze the query execution plan using EXPLAIN and then select the appropriate Shuffle type based on the actual situation. When using it, please pay attention to the following points: + +1. If a DistributeHint fails to correctly generate an execution plan, Doris will not display the Hint but will make it effective based on the principle of "best effort". Ultimately, the Distribute method displayed by EXPLAIN prevails. +2. In the current version, DistributeHint temporarily does not support mixing with LEADING, and the Hint will only take effect when the table specified by Distribute is located on the right side of the Join. +3. It is recommended to mix DistributeHint with ORDERED. First, use ORDERED to fix the Join order, and then specify the expected Distribute method for the corresponding Join. \ No newline at end of file diff --git a/docs/query-acceleration/tuning/tuning-plan/optimizing-table-index.md b/docs/query-acceleration/tuning/tuning-plan/optimizing-table-index.md index f8d6160e54..6e5af4544f 100644 --- a/docs/query-acceleration/tuning/tuning-plan/optimizing-table-index.md +++ b/docs/query-acceleration/tuning/tuning-plan/optimizing-table-index.md @@ -24,13 +24,14 @@ specific language governing permissions and limitations under the License. --> + Doris currently supports two types of indexes: -1. Built-in Indexes. These include prefix indexes, ZoneMap indexes, etc. +1. Built-in Indexes: These include prefix indexes, ZoneMap indexes, etc. -2. Secondary Indexes. These include inverted indexes, Bloom filter indexes, N-Gram Bloom filter indexes, and Bitmap indexes, etc. +2. Secondary Indexes: These include inverted indexes, Bloom filter indexes, N-Gram Bloom filter indexes, and Bitmap indexes, etc. -In the process of business optimization, fully analyzing business characteristics and effectively utilizing indexes can greatly enhance the effectiveness of queries and analyses, thereby achieving the purpose of performance tuning. +In the process of business optimization, fully analyzing business characteristics and make effective use of indexes can greatly enhance the effectiveness of queries and analyses, thereby achieving the purpose of performance tuning. For a detailed introduction to various indexes, please refer to the [Table Index](../../../table-design/index/index-overview) section. This chapter will demonstrate index usage techniques in several typical scenarios from the perspective of actual cases and summarize optimization suggestions for reference in business tuning. @@ -85,7 +86,7 @@ When defining the schema column order, reference the high-frequency and high-pri Doris supports inverted indexes as secondary indexes to accelerate business scenarios such as equal value, range, and full-text search of text types. The creation and management of inverted indexes are independent, allowing for convenient business performance optimization without affecting the original table schema and without the need to re-import table data. -For typical usage scenarios, syntax, and cases, please refer to the Table Index - Inverted Index section for a detailed introduction, so this chapter will not repeat the explanation. +For typical usage scenarios, syntax, and cases, please refer to the [Table Index - Inverted Index](../../../table-design/index/inverted-index) section for a detailed introduction, so this chapter will not repeat the explanation. :::tip diff --git a/docs/query-acceleration/tuning/tuning-plan/solving-unevenly-data-distribution.md b/docs/query-acceleration/tuning/tuning-plan/solving-unevenly-data-distribution.md index 546ec64c45..bd8f25cb3a 100644 --- a/docs/query-acceleration/tuning/tuning-plan/solving-unevenly-data-distribution.md +++ b/docs/query-acceleration/tuning/tuning-plan/solving-unevenly-data-distribution.md @@ -53,7 +53,7 @@ When estimating the filter rate, the optimizer typically relies on the assumptio Considering the following SQL query: ```sql -select count() +select count(1) from orders, customer where o_custkey = c_custkey and o_orderdate < '1920-01-02' ``` @@ -67,7 +67,7 @@ To optimize the query, we need to adjust the SQL statement based on the actual s Revising the SQL as follows: ```sql -select /* leading(orders customer) */ count() +select /* +leading(orders customer) */ count(1) from orders, customer where o_custkey = c_custkey and o_orderdate < '1920-01-02' ``` \ No newline at end of file diff --git a/docs/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-async-mv.md b/docs/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-async-mv.md index bd1827dcd4..96be04741a 100644 --- a/docs/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-async-mv.md +++ b/docs/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-async-mv.md @@ -34,9 +34,11 @@ Next, through an example, we will demonstrate in detail how to use aync-material ### 1 Creation of Base Tables and Data Insertion -First, create two tables, `orders` and `lineitem`, in the TPC-H database, and insert the corresponding data. +First, create two tables, `orders` and `lineitem`, in the tpch database, and insert the corresponding data. ```sql +CREATE DATABASE IF NOT EXISTS tpch; + USE tpch; -- Create the orders table diff --git a/docs/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-sync-mv.md b/docs/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-sync-mv.md index 1ea5165bac..564471a277 100644 --- a/docs/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-sync-mv.md +++ b/docs/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-sync-mv.md @@ -40,44 +40,6 @@ Materialized views are suitable for the following scenarios: For queries that frequently reuse the same subquery results, a sync-materialized view can significantly enhance performance. Doris automatically maintains the data in the materialized view, ensuring data consistency between the base table and the materialized view without additional manual maintenance costs. During queries, the system automatically matches the optimal materialized view and reads data directly from it. -When using materialized views, please note the following points: - -1. In Doris version 2.0, materialized views have enhanced features. It is recommended that users confirm in a test environment whether the expected queries can hit the desired materialized views before using them in a formal production environment. - -2. It is not advisable to create multiple similar materialized views on the same table, as this may lead to conflicts between them, resulting in query misses. - -## Usage Process - -The usage process for materialized views is as follows: - -### 1 Create a Materialized View - -1. Determine the type of materialized view to create based on the characteristics of the query statement. - -2. Extract the common grouping and aggregation methods from multiple queries as the basis for defining the materialized view. - -3. It is not necessary to create materialized views for all dimension combinations; only create them for commonly used dimension combinations. - -4. Creating a materialized view is an asynchronous operation. After submitting the creation task, Doris will compute the existing data in the background until the creation is successful. - -### 2 Automatic Query Matching - -1. After the materialized view is successfully created, when a user queries the base table, Doris will automatically select an optimal materialized view and read data from it for computation. - -2. Users can use the EXPLAIN command to check whether the current query is using a materialized view. - -### 3 Update Strategy - -To ensure data consistency between the materialized view and the base table, Doris synchronizes operations on the base table to the materialized view, using incremental updates to improve update efficiency and ensuring the atomicity of operations through transactions. - -### 4 Supported Aggregation Functions - -1. SUM, MIN, MAX (applicable to Version 0.12) - -2. COUNT, BITMAP_UNION, HLL_UNION (applicable to Version 0.13) - -3. General aggregation functions (applicable to Version 2.0) - ## Tuning Usage Case The following is a specific example to illustrate the use of single-table materialized views: diff --git a/gettingStarted/demo-block/latest.tsx b/gettingStarted/demo-block/latest.tsx index 7be8c681b0..e7ae6c282d 100644 --- a/gettingStarted/demo-block/latest.tsx +++ b/gettingStarted/demo-block/latest.tsx @@ -23,7 +23,7 @@ export default function Latest() { </div> </div> */} <div className="home-page-hero-right"> - <a className="latest-button" href="https://www.meetup.com/apache-doris-meetup/events/303712793/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link"> + <a className="latest-button" href="https://www.velodb.io/events/meetup/singapore"> <div className="home-page-hero-button-label"><div>Event</div></div> <div className="latest-button-title"> {/* <div className="home-page-hero-button-icon"> @@ -32,10 +32,10 @@ export default function Latest() { <path fill="none" d="M0 0h24v24H0Z"></path> </svg> </div> */} - <div style={{ marginBottom: 10 }}>Apache Doris Meetup @ Singapore on Oct 24</div> + <div style={{ marginBottom: 10 }}>Apache Doris Meetup @ Singapore</div> </div> - <div style={{ fontSize: 12, marginBottom: 20 }}>This meetup brings together our main developers, users to discuss various topics, including technology introductions and user experiences.</div> - <div style={{ fontSize: 14, marginBottom: 10 }}>Join us</div> + <div style={{ fontSize: 12, marginBottom: 20 }}>The Apache Doris community came together for an engaging meetup in Singapore to share insights, and explore the latest developments in the world of big data and analytics.</div> + <div style={{ fontSize: 14, marginBottom: 10 }}>Watch Video & Slides</div> </a> <a className="latest-button" href={`/docs${currentVersion === '' ? '' : `/${currentVersion}`}/releasenotes/v3.0/release-3.0.2`}> <div className="home-page-hero-button-label"><div>Release</div></div> diff --git a/gettingStarted/what-is-apache-doris.md b/gettingStarted/what-is-apache-doris.md index 9c63d36f93..a0b62e6d32 100644 --- a/gettingStarted/what-is-apache-doris.md +++ b/gettingStarted/what-is-apache-doris.md @@ -21,7 +21,7 @@ specific language governing permissions and limitations under the License. --> -# What's Apache Doris + Apache Doris is an MPP-based real-time data warehouse known for its high query speed. For queries on large datasets, it returns results in sub-seconds. It supports both high-concurrent point queries and high-throughput complex analysis. It can be used for report analysis, ad-hoc queries, unified data warehouse, and data lake query acceleration. Based on Apache Doris, users can build applications for user behavior analysis, A/B testing platform, log analysis, user profile analysis, and e- [...] @@ -29,7 +29,7 @@ Apache Doris, formerly known as Palo, was initially created to support Baidu's a Apache Doris has a wide user base. It has been used in production environments of over 4000 companies worldwide, including giants such as TikTok, Baidu, Cisco, Tencent, and NetEase. It is also widely used across industries from finance, retailing, and telecommunications to energy, manufacturing, medical care, etc. -# Usage Scenarios +## Usage Scenarios The figure below shows what Apache Doris can do in a data pipeline. Data sources, after integration and processing, are ingested into the Apache Doris real-time data warehouse and offline data lakehouses such as Hive, Iceberg, and Hudi. Apache Doris can be used for the following purposes: @@ -88,7 +88,7 @@ The query engine of Apache Doris is fully vectorized, with all memory structures ![Query engine](/images/apache-doris-query-engine-2.png) -Apache Doris uses **adaptive query execution** technology to dynamically adjust the execution plan based on runtime statistics. For example, it can generate a runtime filter and push it to the probe side. Specifically, it pushes the filters to the lowest-level scan node on the probe side, which largely reduces the data amount to be processed and increases join performance. The runtime filter of Apache Doriz supports In/Min/Max/Bloom Filter. +Apache Doris uses **adaptive query execution** technology to dynamically adjust the execution plan based on runtime statistics. For example, it can generate a runtime filter and push it to the probe side. Specifically, it pushes the filters to the lowest-level scan node on the probe side, which largely reduces the data amount to be processed and increases join performance. The runtime filter of Apache Doris supports In/Min/Max/Bloom Filter. The query **optimizer** of Apache Doris is a combination of CBO and RBO. RBO supports constant folding, subquery rewriting, and predicate pushdown while CBO supports join reorder. The Apache Doris CBO is under continuous optimization for more accurate statistics collection and inference as well as a more accurate cost model. diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs-community/current/join-community.md b/i18n/zh-CN/docusaurus-plugin-content-docs-community/current/join-community.md index d59e2b2166..1334acdf04 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs-community/current/join-community.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs-community/current/join-community.md @@ -31,39 +31,6 @@ under the License. - -## 🙌 更多开发者加入我们 - -[![Monthly Active Contributors](https://contributor-overtime-api.apiseven.com/contributors-svg?chart=contributorMonthlyActivity&repo=apache/doris)](https://www.apiseven.com/en/contributor-graph?chart=contributorMonthlyActivity&repo=apache/doris) - - - - - - - - - - - - - - - -[![Contributor over time](https://contributor-overtime-api.apiseven.com/contributors-svg?chart=contributorOverTime&repo=apache/doris)](https://www.apiseven.com/en/contributor-graph?chart=contributorOverTime&repo=apache/doris) - - - - - - - - - - - - - ## 🌟 更多用户认可我们 <a href="https://star-history.com/#apache/doris&Date"> @@ -81,14 +48,6 @@ under the License. - - - - - - - - **我们非常感谢 🔗[社区贡献者](https://github.com/apache/doris/graphs/contributors) 对 Apache Doris 的大力支持!** diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/overview.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/overview.md index 480a009c24..b233390fa7 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/overview.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/overview.md @@ -24,7 +24,7 @@ specific language governing permissions and limitations under the License. --> -查询性能调优是一个系统工程,需要从多层次、多维度对数据库系统进行调优。以下是调优流程和方法轮概述: +查询性能调优是一个系统工程,需要从多层次、多维度对数据库系统进行调优。以下是调优流程和方法论概述: 1. 首先,业务人员和数据库运维管理人员(DBA)需要对所使用的系统有全面的了解,这包括业务系统使用的硬件、集群的规模、使用的数据库软件版本,以及具体软件版本所提供的特性等。 @@ -87,7 +87,7 @@ Doris 提供了多种不同粒度的 Explain 工具,如 Explain Verbose、Expl 又如,两表的 Inner 等值连接中,连接条件一侧的过滤条件没有推导到另外一侧,导致没有对推导一侧的表进行提前过滤,也可能导致性能问题等。此类性能瓶颈都可以通过分析 Explain 的输出来定位和解决。 -详细信息请参考[计划调优](../../query-acceleration/tuning/tuning-plan/) +详细信息请参考[计划调优](../../query-acceleration/tuning/tuning-plan/optimizing-table-schema) ### Doris Profile @@ -123,19 +123,19 @@ Schema 设计调优基本可分为三个方面: - 特定优化手段的使用,如建立 Colocate Group 等。主要目的是排除因为 Schema 设计不合理或者没有充分利用 Doris 现有优化能力导致的性能问题 -详细调优案例请参考文档[计划调优](../../query-acceleration/tuning/tuning-plan/) +详细调优案例请参考文档[计划调优](../../query-acceleration/tuning/tuning-plan/optimizing-table-schema) ### 第三步:计划调优 检查和优化完业务 Schema 后,将进入调优的主体工作,即计划调优与执行调优。如上所述,在性能调优工具中,这个阶段的主要工作是充分利用 Doris 所提供的各种层级的 Explain 工具,对慢 SQL 的执行计划进行系统分析,以找到关键优化点进行针对性优化。 -- 针对单表查询和分析场景,可以通过分析执行计划、[查看分区裁剪是否正常](../../query-acceleration/tuning/tuning-plan/optimizing-table-scanning)、是否可以[构建单表物化视图进行查询加速](../../query-acceleration/tuning/tuning-plan/transparent-rewriting-with-sync-mv)等。 +- 针对单表查询和分析场景,可以通过分析执行计划、查看分区裁剪是否正常、是否可以构建单表物化视图进行查询加速等。 -- 针对复杂多表分析场景,可以分析统计信息是否正常、分析 Join Order 是否合理、分析 Runtime Filter 是否正常规划等,定位具体的性能瓶颈。如果出现非预期的情况,比如 Join Order 不合理,通过观察 Explain 的结果、手工指定 Join Jint 进行执行计划的绑定、[控制 Join 顺序](../../query-acceleration/tuning/tuning-plan/reordering-join-with-leading-hint)和 [Shuffle 方式](../../query-acceleration/tuning/tuning-plan/adjusting-join-shuffle)、[控制代价改写](../../query-acceleration/tuning/tuning-plan/controlling-hints-with-cbo-rule)行为等操作方法,从而达到调优执行计划的目的。 +- 针对复杂多表分析场景,可以分析统计信息是否正常、分析 Join Order 是否合理、分析 Runtime Filter 是否正常规划等,定位具体的性能瓶颈。如果出现非预期的情况,比如 Join Order 不合理,通过观察 Explain 的结果、手工指定 Join Jint 进行执行计划的绑定、控制 Join 顺序和 Shuffle 方式、控制代价改写行为等操作方法,从而达到调优执行计划的目的。 -- 针对部分特定场景,可以通过使用 Doris 提供的高级功能,比如[异步物化视图改写](../../query-acceleration/tuning/tuning-plan/transparent-rewriting-with-async-mv)、[SQL Cache](../../query-acceleration/tuning/tuning-plan/accelerating-queries-with-sql-cache) 等来加速查询。 +- 针对部分特定场景,可以通过使用 Doris 提供的高级功能,比如异步物化视图改写、SQL Cache 等来加速查询。 -详细调优案例请参考文档[计划调优](../../query-acceleration/tuning/tuning-plan/)。 +详细调优案例请参考文档[计划调优](../../query-acceleration/tuning/tuning-plan/optimizing-table-schema)。 :::tip 提示 @@ -147,7 +147,7 @@ Schema 设计调优基本可分为三个方面: 在这个阶段,我们需要根据 SQL 的实际运行情况,来验证前几步的调优效果是否显著,或者发现慢 SQL 的新瓶颈点。接着,我们可以按图索骥,找到对应的性能优化方案。 -以多表分析的查询为例,我们可以通过分析 Query Profile,来检查 Join 的顺序是否合理,Runtime Filter 是否生效,以及等待时间是否合适。很多时候,执行时的调优更多是为了佐证之前的 Schema 和计划调优是否符合预期。此外,Query Profile 还能反馈出一些 BE 或机器负载的情况,例如 CPU 占用高、网络卡顿等运行状态问题。在针对这些问题进行调优时,我们需要跳出 Doris 本身,进行操作系统级别的调优。详细的调优案例,请参考[查询 Profile 分析](../../query-acceleration/tuning/query-profile)章节。 +以多表分析的查询为例,我们可以通过分析 Query Profile,来检查 Join 的顺序是否合理,Runtime Filter 是否生效,以及等待时间是否合适。很多时候,执行时的调优更多是为了佐证之前的 Schema 和计划调优是否符合预期。详细的调优案例,请参考[查询 Profile 分析](../../query-acceleration/tuning/query-profile)章节。此外,Query Profile 还能反馈出一些 BE 或机器负载的情况,例如 CPU 占用高、网络卡顿等运行状态问题。在针对这些问题进行调优时,我们需要跳出 Doris 本身,进行操作系统级别的调优。 ## 总结 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/adjusting-join-shuffle.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/adjusting-join-shuffle.md index 1821466c47..3e66f7dec0 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/adjusting-join-shuffle.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/adjusting-join-shuffle.md @@ -28,15 +28,15 @@ under the License. Doris 支持使用 Hint 来调整 Join 操作中数据 Shuffle 的类型,从而优化查询性能。本节将详细介绍如何在 Doris 中利用 Hint 来指定 Join Shuffle 的类型。 -目前,Doris 仅限于指定 Join 右表的 Distribute Type,并且仅提供两种类型供选择:`[shuffle] `和 `[broadcast]`。Distribute Type 需置于 Join 右表之前,可采用中括号 `[]` 或双斜杠 `//` 两种表示方式。 +目前,Doris 仅限于指定 Join 右表的 Distribute Type,并且仅提供两种类型供选择:`[shuffle] `和 `[broadcast]`。Distribute Type 需置于 Join 右表之前,可采用中括号 `[]`的方式。 示例如下: ```sql SELECT COUNT(*) FROM t2 JOIN [broadcast] t1 ON t1.c1 = t2.c2; -SELECT COUNT(*) FROM t2 JOIN /*+broadcast*/ t1 ON t1.c1 = t2.c2; +SELECT COUNT(*) FROM t2 JOIN [shuffle] t1 ON t1.c1 = t2.c2; SELECT /*+ ordered */ COUNT(*) FROM t2 JOIN [broadcast] t1 ON t1.c1 = t2.c2; -SELECT /*+ ordered */ COUNT(*) FROM t2 JOIN /+broadcast/ t1 ON t1.c1 = t2.c2; +SELECT /*+ ordered */ COUNT(*) FROM t2 JOIN [shuffle] t1 ON t1.c1 = t2.c2; ``` 在使用时,需注意以下事项: @@ -118,4 +118,10 @@ EXPLAIN SHAPE PLAN SELECT /*+ ordered */ COUNT(*) FROM t2 JOIN [broadcast] t1 ON ## 总结 -通过合理使用 DistributeHint,可以优化 Join 操作的 Shuffle 方式,提升查询性能。在实践中,建议 先通过 EXPLAIN 分析查询执行计划,再根据实际情况选择合适的 Shuffle 类型。 \ No newline at end of file +通过合理使用 DistributeHint,可以优化 Join 操作的 Shuffle 方式,提升查询性能。在实践中,建议 先通过 EXPLAIN 分析查询执行计划,再根据实际情况选择合适的 Shuffle 类型。在使用时,需注意以下事项: + +1. 若遇到无法正确生成执行计划的 DistributeHint 时,Doris 不会显示该 Hint,而是会按“最大努力”原则使其生效。最终,以 EXPLAIN 显示的 Distribute 方式为准。 + +2. 在当前版本中,DistributeHint 暂不支持与 LEADING 混用,且仅当 Distribute 指定的表位于 Join 右边时,Hint 才会生效。 + +3. 建议将 DistributeHint 与 ORDERED 混用。首先利用 ORDERED 固定 Join 顺序,然后再指定相应 Join 中预期使用的 Distribute 方式。 \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/dml-tuning-plan.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/dml-tuning-plan.md index 3f8838a28d..fe7d71347f 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/dml-tuning-plan.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/dml-tuning-plan.md @@ -90,7 +90,9 @@ Apache Doris 提供了多种灵活的数据导入方案,以满足不同场景 4. 监控导入状态:对于异步导入,及时查看作业的运行情况。 -通过灵活运用 Doris 提供的多种导入功能,可以高效地将各种来源的数据导入到 Doris 中进行分析。如需了解更多细节,请参考数据[导入概览](../../../data-operate/import/import-way/load-manual) +:::tip 提示 +通过灵活运用 Doris 提供的多种导入功能,可以高效地将各种来源的数据导入到 Doris 中进行分析。如需了解更多细节,请参考数据[导入概览](../../../data-operate/import/load-manual) +::: ### 导入优化 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/optimizing-table-index.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/optimizing-table-index.md index 65c5fcac79..59bbd7aec8 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/optimizing-table-index.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/optimizing-table-index.md @@ -26,11 +26,11 @@ under the License. Doris 目前支持两类索引: -1. 内置索引。包括前缀索引和 ZoneMap 索引等; +1. 内置索引:包括前缀索引和 ZoneMap 索引等; -2. 二级索引。包括倒排索引、Bloomfilter 索引、N-Gram Bloomfilter 索引和 Bitmap 索引等 +2. 二级索引:包括倒排索引、Bloomfilter 索引、N-Gram Bloomfilter 索引和 Bitmap 索引等。 -在业务优化过程中,充分分析业务特征,并有效利用索引,会大大提升查询和分析的效果,从而达到性能调优的目的。 +在业务优化过程中,充分分析业务特征并有效利用索引,会大大提升查询和分析的效果,从而达到性能调优的目的。 各类索引的详细介绍可以参考[表索引](../../../table-design/index/index-overview)章节进行了解。本章将从实际案例的角度出发,展示几种典型场景下的索引使用技巧,并总结优化建议,以供业务调优时参考。 @@ -85,7 +85,7 @@ PROPERTIES ( Doris 支持倒排索引作为二级索引,以加速等值、范围及文本类型的全文检索等业务场景。倒排索引的创建和管理是独立的,它能够在不影响原始表 Schema 和无需重新导入表数据的情况下,便捷地进行业务性能优化。 -关于典型的使用场景、语法及案例,可参考表索引 - 倒排索引,查看详细介绍,因此本章节不再重复阐述。 +关于典型的使用场景、语法及案例,可参考[表索引 - 倒排索引](../../../table-design/index/inverted-index),查看详细介绍,本章节不再重复阐述。 :::tip 优化建议 @@ -93,7 +93,7 @@ Doris 支持倒排索引作为二级索引,以加速等值、范围及文本 ::: -## 案例 3: 调整字段顺序以利用 ZoneMap 索引过滤数据 +## 案例 3: 调整字段顺序利用 ZoneMap 索引过滤数据 ZoneMap 索引是一种内置索引,它在列存数据上维护了如 Min/Max/Count 等统计信息。当 Doris 对存储层进行数据扫描时,会根据过滤字段的 ZoneMap 统计信息(如 Min/Max)快速过滤掉不满足过滤条件的数据块,从而有效减少扫表量。 @@ -154,4 +154,4 @@ PROPERTIES ( ## 总结 -在 Schema 调优中,除了表级 Schema 优化外,索引优化同样占据重要地位。Doris 提供了多种索引类型,包括前缀、ZoneMap 等内置索引,以及倒排等二级索引,这些索引为性能加速提供了强大的支持。通过合理利用这些索引,我们可以显著提升多场景下的业务查询和分析速度,这对于多场景业务查询和分析具有重要意义。 \ No newline at end of file +在 Schema 调优中,除了表级 Schema 优化外,索引优化同样占据重要地位。Doris 提供了多种索引类型,包括前缀、ZoneMap 等内置索引,以及倒排等二级索引。这些索引为性能加速提供了强大的支持,通过合理利用这些索引,我们可以显著提升多场景下的业务查询和分析速度,这对于多场景业务查询和分析具有重要意义。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/optimizing-table-schema.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/optimizing-table-schema.md index f076329496..7f50eedc98 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/optimizing-table-schema.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/optimizing-table-schema.md @@ -24,7 +24,7 @@ specific language governing permissions and limitations under the License. --> -Schema 设计和调优中,表的 Schema 设计是其中重要的一部分,包括表引擎选择、分区分桶列选择、分区分桶大小设置、key 列和字段类型优化等。缺乏 Schema 设计的系统,有可能会导致数据倾斜等问题,不能充分利用系统并行和排序特性,从而影响 Doris 系统在业务系统中发挥真实的性能优势。 +Schema 设计和调优中,表的 Schema 设计是其中重要的一部分,包括表引擎选择、分区分桶列选择、分区分桶大小设置、key 列和字段类型优化等。缺乏 Schema 设计的系统,有可能会导致数据倾斜等问题,不能充分利用系统并行和排序特性,从而影响 Doris 在业务系统中发挥真实的性能优势。 详细的设计原则可以参考[数据表设计](../../../table-design/overview)章节了解详细信息。本章将从实际案例的角度,展示几种典型场景下因 Schema 设计问题导致的性能瓶颈,并给出优化建议,供业务调优参考。 @@ -113,7 +113,7 @@ PROPERTIES ( 1. 在满足业务系统表达和计算需求的前提下,应优先选择定长类型,避免使用变长类型; -2. 同时,尽量采用低精类型,而非高精类型。具体实践包括:使用 BIGINT 替代 VARCHAR 或 STRING 类型的字段,以及用 FLOAT / INT / BIGINT 替换 DECIMAL 类型的字段等。此类字段类型的合理设计和优化,将极大地提升业务的计算效率,从而增强系统性能。 +2. 尽量采用低精类型,避免高精类型。具体实践包括:使用 BIGINT 替代 VARCHAR 或 STRING 类型的字段,以及用 FLOAT / INT / BIGINT 替换 DECIMAL 类型的字段等。此类字段类型的合理设计和优化,将极大地提升业务的计算效率,从而增强系统性能。 :::tip 优化建议 在定义 Schema 类型时,应遵循定长和低精优先的原则。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/reordering-join-with-leading-hint.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/reordering-join-with-leading-hint.md index ef625d1119..0478694642 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/reordering-join-with-leading-hint.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/reordering-join-with-leading-hint.md @@ -64,7 +64,6 @@ EXPLAIN SELECT /*+ LEADING(t2 t1) */ * FROM t1 JOIN t2 ON t1.c1 = t2.c2; ## 调优案例 -## 调优案例 **1. 左深树(默认行为)** @@ -195,4 +194,4 @@ t2 t1 2. 当升级 Doris 版本时,应重新评估 Leading Hint 的效果,因为优化器的策略可能会有所调整。 -3. 对于复杂的查询,建议使用 EXPLAIN 命令来仔细分析执行计划,以确保 Leading Hint 能达到预期的效果。 \ No newline at end of file +3. 对于复杂的查询,建议使用 EXPLAIN 命令来仔细分析执行计划,以确保 Leading Hint 能达到预期的效果。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/solving-unevenly-data-distribution.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/solving-unevenly-data-distribution.md index b9bef8ea78..65c81ea4bb 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/solving-unevenly-data-distribution.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/solving-unevenly-data-distribution.md @@ -53,7 +53,7 @@ SELECT COUNT(*) FROM orders JOIN customer ON o_custkey = c_custkey; 以下 SQL 查询为例: ```sql -select count() +select count(1) from orders, customer where o_custkey = c_custkey and o_orderdate < '1920-01-02' ``` @@ -67,7 +67,7 @@ where o_custkey = c_custkey and o_orderdate < '1920-01-02' 改写 SQL 如下: ```sql -select /* leading(orders customer) */ count() +select /* +leading(orders customer) */ count(1) from orders, customer where o_custkey = c_custkey and o_orderdate < '1920-01-02' ``` \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-async-mv.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-async-mv.md index e86692960c..56b69b3095 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-async-mv.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-async-mv.md @@ -34,9 +34,10 @@ under the License. ### 1 创建基础表与数据插入 -首先,在 TPC-H 数据库中创建 `orders` 和 `lineitem` 两张表,并插入相应的数据。 +首先,创建 tpch 数据库并在其中创建 `orders` 和 `lineitem` 两张表,并插入相应的数据。 ```sql +CREATE DATABASE IF NOT EXISTS tpch; USE tpch; -- 创建 orders 表 @@ -221,4 +222,4 @@ DROP MATERIALIZED VIEW mv1; 6. 适用场景:物化视图适用于数据变化频率较低、查询频率较高的场景。对于经常变化的数据,实时计算可能更为合适。 -合理利用物化视图,可以显著改善数据库的查询性能,特别是在复杂查询和大数据量的情况下。同时,也需要综合考虑存储、维护等因素,以实现性能和成本的平衡。 \ No newline at end of file +合理利用物化视图,可以显著改善数据库的查询性能,特别是在复杂查询和大数据量的情况下。同时,也需要综合考虑存储、维护等因素,以实现性能和成本的平衡。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-sync-mv.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-sync-mv.md index b0f10d0af1..e6bc647586 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-sync-mv.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-sync-mv.md @@ -40,43 +40,6 @@ under the License. 对于频繁重复使用相同子查询结果的查询,单表同步物化视图能显著提升性能。Doris 会自动维护物化视图的数据,确保基础表 (Base Table) 和物化视图表的数据一致性,无需额外的人工维护成本。在查询时,系统会自动匹配到最优的物化视图,并直接从物化视图中读取数据。 -在使用物化视图时,请注意以下几点: - -1. 在 Doris 2.0 版本中,物化视图具备了一些增强功能。建议用户在正式的生产环境中使用物化视图之前,先在测试环境中确认预期中的查询能否命中想要创建的物化视图。 - -2. 不建议在同一张表上创建多个形态类似的物化视图,因为这可能会导致多个物化视图之间的冲突,从而使查询命中失败。 - -## 使用流程 - -物化视图的使用流程如下: - -### 1 创建物化视图 - -1. 根据查询语句的特性,确定应创建何种类型的物化视图。 - -2. 从查询语句中提炼出多个查询共有的分组及聚合方式,以此作为物化视图的定义依据。 - -3. 无需为所有维度组合都创建物化视图,仅需为常用的维度组合进行创建。 - -4. 创建物化视图是一个异步操作。提交创建任务后,Doris 会在后台对现有数据进行计算,直至创建成功。 - -### 2 查询自动匹配 - -1. 物化视图创建成功后,当用户查询 Base 表时,Doris 会自动选择一个最优的物化视图,并从该物化视图中读取数据进行计算。 - -2. 用户可通过 EXPLAIN 命令来检查当前查询是否使用了物化视图。 - -### 3 更新策略 - -为确保物化视图表与 Base 表的数据一致性,Doris 会将对 Base 表的操作同步到物化视图表中,并采用增量更新的方式提高更新效率,同时通过事务方式保证操作的原子性。 - -### 4 支持的聚合函数 - -1. SUM, MIN, MAX (适用于 Version 0.12) - -2. COUNT, BITMAP_UNION, HLL_UNION (适用于 Version 0.13) - -3. 通用聚合函数 (适用于 Version 2.0) ## 调优案例 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/index/inverted-index.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/index/inverted-index.md index 932a04baa0..f4828e09a9 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/index/inverted-index.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/index/inverted-index.md @@ -243,7 +243,7 @@ CANCEL BUILD INDEX ON table_name (job_id1,jobid_2,...); :::tip -`BUILD INDEX` 会生成一个异步任务执行,在每个 BE 上有多个线程执行索引构建任务,通过 BE 参数 `alter_index_worker_count` 可以设置,默认值是 3。 +`BUILD INDEX` 会生成一个异步任务执行,在每个 BE 上有多个线程执行索引删除任务,通过 BE 参数 `alter_index_worker_count` 可以设置,默认值是 3。 2.0.12 之前的版本 `BUILD INDEX` 会一直重试直到成功,从这两个版本开始通过失败和超时机制避免一直重试。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query-acceleration/tuning/overview.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query-acceleration/tuning/overview.md index ba45593463..bcd1e90552 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query-acceleration/tuning/overview.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query-acceleration/tuning/overview.md @@ -87,7 +87,7 @@ Doris 提供了多种不同粒度的 Explain 工具,如 Explain Verbose、Expl 又如,两表的 Inner 等值连接中,连接条件一侧的过滤条件没有推导到另外一侧,导致没有对推导一侧的表进行提前过滤,也可能导致性能问题等。此类性能瓶颈都可以通过分析 Explain 的输出来定位和解决。 -详细信息请参考[计划调优](../../query-acceleration/tuning/tuning-plan/) +详细信息请参考[计划调优](../../query-acceleration/tuning/tuning-plan/optimizing-table-schema) ### Doris Profile @@ -123,19 +123,19 @@ Schema 设计调优基本可分为三个方面: - 特定优化手段的使用,如建立 Colocate Group 等。主要目的是排除因为 Schema 设计不合理或者没有充分利用 Doris 现有优化能力导致的性能问题 -详细调优案例请参考文档[计划调优](../../query-acceleration/tuning/tuning-plan/) +详细调优案例请参考文档[计划调优](../../query-acceleration/tuning/tuning-plan/optimizing-table-schema) ### 第三步:计划调优 检查和优化完业务 Schema 后,将进入调优的主体工作,即计划调优与执行调优。如上所述,在性能调优工具中,这个阶段的主要工作是充分利用 Doris 所提供的各种层级的 Explain 工具,对慢 SQL 的执行计划进行系统分析,以找到关键优化点进行针对性优化。 -- 针对单表查询和分析场景,可以通过分析执行计划、[查看分区裁剪是否正常](../../query-acceleration/tuning/tuning-plan/optimizing-table-scanning)、是否可以[构建单表物化视图进行查询加速](../../query-acceleration/tuning/tuning-plan/transparent-rewriting-with-sync-mv)等。 +- 针对单表查询和分析场景,可以通过分析执行计划、查看分区裁剪是否正常、是否可以构建单表物化视图进行查询加速等。 -- 针对复杂多表分析场景,可以分析统计信息是否正常、分析 Join Order 是否合理、分析 Runtime Filter 是否正常规划等,定位具体的性能瓶颈。如果出现非预期的情况,比如 Join Order 不合理,通过观察 Explain 的结果、手工指定 Join Jint 进行执行计划的绑定、[控制 Join 顺序](../../query-acceleration/tuning/tuning-plan/reordering-join-with-leading-hint)和 [Shuffle 方式](../../query-acceleration/tuning/tuning-plan/adjusting-join-shuffle)、[控制代价改写](../../query-acceleration/tuning/tuning-plan/controlling-hints-with-cbo-rule)行为等操作方法,从而达到调优执行计划的目的。 +- 针对复杂多表分析场景,可以分析统计信息是否正常、分析 Join Order 是否合理、分析 Runtime Filter 是否正常规划等,定位具体的性能瓶颈。如果出现非预期的情况,比如 Join Order 不合理,通过观察 Explain 的结果、手工指定 Join Jint 进行执行计划的绑定、控制 Join 顺序和 Shuffle 方式、控制代价改写行为等操作方法,从而达到调优执行计划的目的。 -- 针对部分特定场景,可以通过使用 Doris 提供的高级功能,比如[异步物化视图改写](../../query-acceleration/tuning/tuning-plan/transparent-rewriting-with-async-mv)、[SQL Cache](../../query-acceleration/tuning/tuning-plan/accelerating-queries-with-sql-cache) 等来加速查询。 +- 针对部分特定场景,可以通过使用 Doris 提供的高级功能,比如异步物化视图改写、SQL Cache 等来加速查询。 -详细调优案例请参考文档[计划调优](../../query-acceleration/tuning/tuning-plan/)。 +详细调优案例请参考文档[计划调优](../../query-acceleration/tuning/tuning-plan/optimizing-table-schema)。 :::tip 提示 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query-acceleration/tuning/tuning-plan/dml-tuning-plan.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query-acceleration/tuning/tuning-plan/dml-tuning-plan.md index 3f8838a28d..fe7d71347f 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query-acceleration/tuning/tuning-plan/dml-tuning-plan.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query-acceleration/tuning/tuning-plan/dml-tuning-plan.md @@ -90,7 +90,9 @@ Apache Doris 提供了多种灵活的数据导入方案,以满足不同场景 4. 监控导入状态:对于异步导入,及时查看作业的运行情况。 -通过灵活运用 Doris 提供的多种导入功能,可以高效地将各种来源的数据导入到 Doris 中进行分析。如需了解更多细节,请参考数据[导入概览](../../../data-operate/import/import-way/load-manual) +:::tip 提示 +通过灵活运用 Doris 提供的多种导入功能,可以高效地将各种来源的数据导入到 Doris 中进行分析。如需了解更多细节,请参考数据[导入概览](../../../data-operate/import/load-manual) +::: ### 导入优化 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query-acceleration/tuning/tuning-plan/optimizing-table-index.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query-acceleration/tuning/tuning-plan/optimizing-table-index.md index 1e9794d8ad..d6e58483bc 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query-acceleration/tuning/tuning-plan/optimizing-table-index.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query-acceleration/tuning/tuning-plan/optimizing-table-index.md @@ -85,7 +85,9 @@ PROPERTIES ( Doris 支持倒排索引作为二级索引,以加速等值、范围及文本类型的全文检索等业务场景。倒排索引的创建和管理是独立的,它能够在不影响原始表 Schema 和无需重新导入表数据的情况下,便捷地进行业务性能优化。 -关于典型的使用场景、语法及案例,可参考表索引 - 倒排索引,查看详细介绍,本章节不再重复阐述。 + +关于典型的使用场景、语法及案例,可参考[表索引 - 倒排索引](../../../table-design/index/inverted-index),查看详细介绍,因此本章节不再重复阐述。 + :::tip 优化建议 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/overview.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/overview.md index 480a009c24..b233390fa7 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/overview.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/overview.md @@ -24,7 +24,7 @@ specific language governing permissions and limitations under the License. --> -查询性能调优是一个系统工程,需要从多层次、多维度对数据库系统进行调优。以下是调优流程和方法轮概述: +查询性能调优是一个系统工程,需要从多层次、多维度对数据库系统进行调优。以下是调优流程和方法论概述: 1. 首先,业务人员和数据库运维管理人员(DBA)需要对所使用的系统有全面的了解,这包括业务系统使用的硬件、集群的规模、使用的数据库软件版本,以及具体软件版本所提供的特性等。 @@ -87,7 +87,7 @@ Doris 提供了多种不同粒度的 Explain 工具,如 Explain Verbose、Expl 又如,两表的 Inner 等值连接中,连接条件一侧的过滤条件没有推导到另外一侧,导致没有对推导一侧的表进行提前过滤,也可能导致性能问题等。此类性能瓶颈都可以通过分析 Explain 的输出来定位和解决。 -详细信息请参考[计划调优](../../query-acceleration/tuning/tuning-plan/) +详细信息请参考[计划调优](../../query-acceleration/tuning/tuning-plan/optimizing-table-schema) ### Doris Profile @@ -123,19 +123,19 @@ Schema 设计调优基本可分为三个方面: - 特定优化手段的使用,如建立 Colocate Group 等。主要目的是排除因为 Schema 设计不合理或者没有充分利用 Doris 现有优化能力导致的性能问题 -详细调优案例请参考文档[计划调优](../../query-acceleration/tuning/tuning-plan/) +详细调优案例请参考文档[计划调优](../../query-acceleration/tuning/tuning-plan/optimizing-table-schema) ### 第三步:计划调优 检查和优化完业务 Schema 后,将进入调优的主体工作,即计划调优与执行调优。如上所述,在性能调优工具中,这个阶段的主要工作是充分利用 Doris 所提供的各种层级的 Explain 工具,对慢 SQL 的执行计划进行系统分析,以找到关键优化点进行针对性优化。 -- 针对单表查询和分析场景,可以通过分析执行计划、[查看分区裁剪是否正常](../../query-acceleration/tuning/tuning-plan/optimizing-table-scanning)、是否可以[构建单表物化视图进行查询加速](../../query-acceleration/tuning/tuning-plan/transparent-rewriting-with-sync-mv)等。 +- 针对单表查询和分析场景,可以通过分析执行计划、查看分区裁剪是否正常、是否可以构建单表物化视图进行查询加速等。 -- 针对复杂多表分析场景,可以分析统计信息是否正常、分析 Join Order 是否合理、分析 Runtime Filter 是否正常规划等,定位具体的性能瓶颈。如果出现非预期的情况,比如 Join Order 不合理,通过观察 Explain 的结果、手工指定 Join Jint 进行执行计划的绑定、[控制 Join 顺序](../../query-acceleration/tuning/tuning-plan/reordering-join-with-leading-hint)和 [Shuffle 方式](../../query-acceleration/tuning/tuning-plan/adjusting-join-shuffle)、[控制代价改写](../../query-acceleration/tuning/tuning-plan/controlling-hints-with-cbo-rule)行为等操作方法,从而达到调优执行计划的目的。 +- 针对复杂多表分析场景,可以分析统计信息是否正常、分析 Join Order 是否合理、分析 Runtime Filter 是否正常规划等,定位具体的性能瓶颈。如果出现非预期的情况,比如 Join Order 不合理,通过观察 Explain 的结果、手工指定 Join Jint 进行执行计划的绑定、控制 Join 顺序和 Shuffle 方式、控制代价改写行为等操作方法,从而达到调优执行计划的目的。 -- 针对部分特定场景,可以通过使用 Doris 提供的高级功能,比如[异步物化视图改写](../../query-acceleration/tuning/tuning-plan/transparent-rewriting-with-async-mv)、[SQL Cache](../../query-acceleration/tuning/tuning-plan/accelerating-queries-with-sql-cache) 等来加速查询。 +- 针对部分特定场景,可以通过使用 Doris 提供的高级功能,比如异步物化视图改写、SQL Cache 等来加速查询。 -详细调优案例请参考文档[计划调优](../../query-acceleration/tuning/tuning-plan/)。 +详细调优案例请参考文档[计划调优](../../query-acceleration/tuning/tuning-plan/optimizing-table-schema)。 :::tip 提示 @@ -147,7 +147,7 @@ Schema 设计调优基本可分为三个方面: 在这个阶段,我们需要根据 SQL 的实际运行情况,来验证前几步的调优效果是否显著,或者发现慢 SQL 的新瓶颈点。接着,我们可以按图索骥,找到对应的性能优化方案。 -以多表分析的查询为例,我们可以通过分析 Query Profile,来检查 Join 的顺序是否合理,Runtime Filter 是否生效,以及等待时间是否合适。很多时候,执行时的调优更多是为了佐证之前的 Schema 和计划调优是否符合预期。此外,Query Profile 还能反馈出一些 BE 或机器负载的情况,例如 CPU 占用高、网络卡顿等运行状态问题。在针对这些问题进行调优时,我们需要跳出 Doris 本身,进行操作系统级别的调优。详细的调优案例,请参考[查询 Profile 分析](../../query-acceleration/tuning/query-profile)章节。 +以多表分析的查询为例,我们可以通过分析 Query Profile,来检查 Join 的顺序是否合理,Runtime Filter 是否生效,以及等待时间是否合适。很多时候,执行时的调优更多是为了佐证之前的 Schema 和计划调优是否符合预期。详细的调优案例,请参考[查询 Profile 分析](../../query-acceleration/tuning/query-profile)章节。此外,Query Profile 还能反馈出一些 BE 或机器负载的情况,例如 CPU 占用高、网络卡顿等运行状态问题。在针对这些问题进行调优时,我们需要跳出 Doris 本身,进行操作系统级别的调优。 ## 总结 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/tuning-plan/adjusting-join-shuffle.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/tuning-plan/adjusting-join-shuffle.md index 1821466c47..3e66f7dec0 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/tuning-plan/adjusting-join-shuffle.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/tuning-plan/adjusting-join-shuffle.md @@ -28,15 +28,15 @@ under the License. Doris 支持使用 Hint 来调整 Join 操作中数据 Shuffle 的类型,从而优化查询性能。本节将详细介绍如何在 Doris 中利用 Hint 来指定 Join Shuffle 的类型。 -目前,Doris 仅限于指定 Join 右表的 Distribute Type,并且仅提供两种类型供选择:`[shuffle] `和 `[broadcast]`。Distribute Type 需置于 Join 右表之前,可采用中括号 `[]` 或双斜杠 `//` 两种表示方式。 +目前,Doris 仅限于指定 Join 右表的 Distribute Type,并且仅提供两种类型供选择:`[shuffle] `和 `[broadcast]`。Distribute Type 需置于 Join 右表之前,可采用中括号 `[]`的方式。 示例如下: ```sql SELECT COUNT(*) FROM t2 JOIN [broadcast] t1 ON t1.c1 = t2.c2; -SELECT COUNT(*) FROM t2 JOIN /*+broadcast*/ t1 ON t1.c1 = t2.c2; +SELECT COUNT(*) FROM t2 JOIN [shuffle] t1 ON t1.c1 = t2.c2; SELECT /*+ ordered */ COUNT(*) FROM t2 JOIN [broadcast] t1 ON t1.c1 = t2.c2; -SELECT /*+ ordered */ COUNT(*) FROM t2 JOIN /+broadcast/ t1 ON t1.c1 = t2.c2; +SELECT /*+ ordered */ COUNT(*) FROM t2 JOIN [shuffle] t1 ON t1.c1 = t2.c2; ``` 在使用时,需注意以下事项: @@ -118,4 +118,10 @@ EXPLAIN SHAPE PLAN SELECT /*+ ordered */ COUNT(*) FROM t2 JOIN [broadcast] t1 ON ## 总结 -通过合理使用 DistributeHint,可以优化 Join 操作的 Shuffle 方式,提升查询性能。在实践中,建议 先通过 EXPLAIN 分析查询执行计划,再根据实际情况选择合适的 Shuffle 类型。 \ No newline at end of file +通过合理使用 DistributeHint,可以优化 Join 操作的 Shuffle 方式,提升查询性能。在实践中,建议 先通过 EXPLAIN 分析查询执行计划,再根据实际情况选择合适的 Shuffle 类型。在使用时,需注意以下事项: + +1. 若遇到无法正确生成执行计划的 DistributeHint 时,Doris 不会显示该 Hint,而是会按“最大努力”原则使其生效。最终,以 EXPLAIN 显示的 Distribute 方式为准。 + +2. 在当前版本中,DistributeHint 暂不支持与 LEADING 混用,且仅当 Distribute 指定的表位于 Join 右边时,Hint 才会生效。 + +3. 建议将 DistributeHint 与 ORDERED 混用。首先利用 ORDERED 固定 Join 顺序,然后再指定相应 Join 中预期使用的 Distribute 方式。 \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/tuning-plan/dml-tuning-plan.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/tuning-plan/dml-tuning-plan.md index 3f8838a28d..fe7d71347f 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/tuning-plan/dml-tuning-plan.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/tuning-plan/dml-tuning-plan.md @@ -90,7 +90,9 @@ Apache Doris 提供了多种灵活的数据导入方案,以满足不同场景 4. 监控导入状态:对于异步导入,及时查看作业的运行情况。 -通过灵活运用 Doris 提供的多种导入功能,可以高效地将各种来源的数据导入到 Doris 中进行分析。如需了解更多细节,请参考数据[导入概览](../../../data-operate/import/import-way/load-manual) +:::tip 提示 +通过灵活运用 Doris 提供的多种导入功能,可以高效地将各种来源的数据导入到 Doris 中进行分析。如需了解更多细节,请参考数据[导入概览](../../../data-operate/import/load-manual) +::: ### 导入优化 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/tuning-plan/optimizing-table-index.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/tuning-plan/optimizing-table-index.md index 65c5fcac79..d9b6f6d915 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/tuning-plan/optimizing-table-index.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/tuning-plan/optimizing-table-index.md @@ -24,13 +24,14 @@ specific language governing permissions and limitations under the License. --> + Doris 目前支持两类索引: -1. 内置索引。包括前缀索引和 ZoneMap 索引等; +1. 内置索引:包括前缀索引和 ZoneMap 索引等; -2. 二级索引。包括倒排索引、Bloomfilter 索引、N-Gram Bloomfilter 索引和 Bitmap 索引等 +2. 二级索引:包括倒排索引、Bloomfilter 索引、N-Gram Bloomfilter 索引和 Bitmap 索引等。 -在业务优化过程中,充分分析业务特征,并有效利用索引,会大大提升查询和分析的效果,从而达到性能调优的目的。 +在业务优化过程中,充分分析业务特征并有效利用索引,会大大提升查询和分析的效果,从而达到性能调优的目的。 各类索引的详细介绍可以参考[表索引](../../../table-design/index/index-overview)章节进行了解。本章将从实际案例的角度出发,展示几种典型场景下的索引使用技巧,并总结优化建议,以供业务调优时参考。 @@ -85,7 +86,7 @@ PROPERTIES ( Doris 支持倒排索引作为二级索引,以加速等值、范围及文本类型的全文检索等业务场景。倒排索引的创建和管理是独立的,它能够在不影响原始表 Schema 和无需重新导入表数据的情况下,便捷地进行业务性能优化。 -关于典型的使用场景、语法及案例,可参考表索引 - 倒排索引,查看详细介绍,因此本章节不再重复阐述。 +关于典型的使用场景、语法及案例,可参考[表索引 - 倒排索引](../../../table-design/index/inverted-index),查看详细介绍,本章节不再重复阐述。 :::tip 优化建议 @@ -93,7 +94,7 @@ Doris 支持倒排索引作为二级索引,以加速等值、范围及文本 ::: -## 案例 3: 调整字段顺序以利用 ZoneMap 索引过滤数据 +## 案例 3: 调整字段顺序利用 ZoneMap 索引过滤数据 ZoneMap 索引是一种内置索引,它在列存数据上维护了如 Min/Max/Count 等统计信息。当 Doris 对存储层进行数据扫描时,会根据过滤字段的 ZoneMap 统计信息(如 Min/Max)快速过滤掉不满足过滤条件的数据块,从而有效减少扫表量。 @@ -154,4 +155,4 @@ PROPERTIES ( ## 总结 -在 Schema 调优中,除了表级 Schema 优化外,索引优化同样占据重要地位。Doris 提供了多种索引类型,包括前缀、ZoneMap 等内置索引,以及倒排等二级索引,这些索引为性能加速提供了强大的支持。通过合理利用这些索引,我们可以显著提升多场景下的业务查询和分析速度,这对于多场景业务查询和分析具有重要意义。 \ No newline at end of file +在 Schema 调优中,除了表级 Schema 优化外,索引优化同样占据重要地位。Doris 提供了多种索引类型,包括前缀、ZoneMap 等内置索引,以及倒排等二级索引。这些索引为性能加速提供了强大的支持,通过合理利用这些索引,我们可以显著提升多场景下的业务查询和分析速度,这对于多场景业务查询和分析具有重要意义。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/tuning-plan/optimizing-table-schema.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/tuning-plan/optimizing-table-schema.md index f076329496..7f50eedc98 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/tuning-plan/optimizing-table-schema.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/tuning-plan/optimizing-table-schema.md @@ -24,7 +24,7 @@ specific language governing permissions and limitations under the License. --> -Schema 设计和调优中,表的 Schema 设计是其中重要的一部分,包括表引擎选择、分区分桶列选择、分区分桶大小设置、key 列和字段类型优化等。缺乏 Schema 设计的系统,有可能会导致数据倾斜等问题,不能充分利用系统并行和排序特性,从而影响 Doris 系统在业务系统中发挥真实的性能优势。 +Schema 设计和调优中,表的 Schema 设计是其中重要的一部分,包括表引擎选择、分区分桶列选择、分区分桶大小设置、key 列和字段类型优化等。缺乏 Schema 设计的系统,有可能会导致数据倾斜等问题,不能充分利用系统并行和排序特性,从而影响 Doris 在业务系统中发挥真实的性能优势。 详细的设计原则可以参考[数据表设计](../../../table-design/overview)章节了解详细信息。本章将从实际案例的角度,展示几种典型场景下因 Schema 设计问题导致的性能瓶颈,并给出优化建议,供业务调优参考。 @@ -113,7 +113,7 @@ PROPERTIES ( 1. 在满足业务系统表达和计算需求的前提下,应优先选择定长类型,避免使用变长类型; -2. 同时,尽量采用低精类型,而非高精类型。具体实践包括:使用 BIGINT 替代 VARCHAR 或 STRING 类型的字段,以及用 FLOAT / INT / BIGINT 替换 DECIMAL 类型的字段等。此类字段类型的合理设计和优化,将极大地提升业务的计算效率,从而增强系统性能。 +2. 尽量采用低精类型,避免高精类型。具体实践包括:使用 BIGINT 替代 VARCHAR 或 STRING 类型的字段,以及用 FLOAT / INT / BIGINT 替换 DECIMAL 类型的字段等。此类字段类型的合理设计和优化,将极大地提升业务的计算效率,从而增强系统性能。 :::tip 优化建议 在定义 Schema 类型时,应遵循定长和低精优先的原则。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/tuning-plan/reordering-join-with-leading-hint.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/tuning-plan/reordering-join-with-leading-hint.md index ef625d1119..c8101e4be0 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/tuning-plan/reordering-join-with-leading-hint.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/tuning-plan/reordering-join-with-leading-hint.md @@ -64,7 +64,7 @@ EXPLAIN SELECT /*+ LEADING(t2 t1) */ * FROM t1 JOIN t2 ON t1.c1 = t2.c2; ## 调优案例 -## 调优案例 + **1. 左深树(默认行为)** diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/tuning-plan/solving-unevenly-data-distribution.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/tuning-plan/solving-unevenly-data-distribution.md index b9bef8ea78..65c81ea4bb 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/tuning-plan/solving-unevenly-data-distribution.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/tuning-plan/solving-unevenly-data-distribution.md @@ -53,7 +53,7 @@ SELECT COUNT(*) FROM orders JOIN customer ON o_custkey = c_custkey; 以下 SQL 查询为例: ```sql -select count() +select count(1) from orders, customer where o_custkey = c_custkey and o_orderdate < '1920-01-02' ``` @@ -67,7 +67,7 @@ where o_custkey = c_custkey and o_orderdate < '1920-01-02' 改写 SQL 如下: ```sql -select /* leading(orders customer) */ count() +select /* +leading(orders customer) */ count(1) from orders, customer where o_custkey = c_custkey and o_orderdate < '1920-01-02' ``` \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-async-mv.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-async-mv.md index e86692960c..56b69b3095 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-async-mv.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-async-mv.md @@ -34,9 +34,10 @@ under the License. ### 1 创建基础表与数据插入 -首先,在 TPC-H 数据库中创建 `orders` 和 `lineitem` 两张表,并插入相应的数据。 +首先,创建 tpch 数据库并在其中创建 `orders` 和 `lineitem` 两张表,并插入相应的数据。 ```sql +CREATE DATABASE IF NOT EXISTS tpch; USE tpch; -- 创建 orders 表 @@ -221,4 +222,4 @@ DROP MATERIALIZED VIEW mv1; 6. 适用场景:物化视图适用于数据变化频率较低、查询频率较高的场景。对于经常变化的数据,实时计算可能更为合适。 -合理利用物化视图,可以显著改善数据库的查询性能,特别是在复杂查询和大数据量的情况下。同时,也需要综合考虑存储、维护等因素,以实现性能和成本的平衡。 \ No newline at end of file +合理利用物化视图,可以显著改善数据库的查询性能,特别是在复杂查询和大数据量的情况下。同时,也需要综合考虑存储、维护等因素,以实现性能和成本的平衡。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-sync-mv.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-sync-mv.md index b0f10d0af1..f6ef4eca7a 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-sync-mv.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-sync-mv.md @@ -40,44 +40,6 @@ under the License. 对于频繁重复使用相同子查询结果的查询,单表同步物化视图能显著提升性能。Doris 会自动维护物化视图的数据,确保基础表 (Base Table) 和物化视图表的数据一致性,无需额外的人工维护成本。在查询时,系统会自动匹配到最优的物化视图,并直接从物化视图中读取数据。 -在使用物化视图时,请注意以下几点: - -1. 在 Doris 2.0 版本中,物化视图具备了一些增强功能。建议用户在正式的生产环境中使用物化视图之前,先在测试环境中确认预期中的查询能否命中想要创建的物化视图。 - -2. 不建议在同一张表上创建多个形态类似的物化视图,因为这可能会导致多个物化视图之间的冲突,从而使查询命中失败。 - -## 使用流程 - -物化视图的使用流程如下: - -### 1 创建物化视图 - -1. 根据查询语句的特性,确定应创建何种类型的物化视图。 - -2. 从查询语句中提炼出多个查询共有的分组及聚合方式,以此作为物化视图的定义依据。 - -3. 无需为所有维度组合都创建物化视图,仅需为常用的维度组合进行创建。 - -4. 创建物化视图是一个异步操作。提交创建任务后,Doris 会在后台对现有数据进行计算,直至创建成功。 - -### 2 查询自动匹配 - -1. 物化视图创建成功后,当用户查询 Base 表时,Doris 会自动选择一个最优的物化视图,并从该物化视图中读取数据进行计算。 - -2. 用户可通过 EXPLAIN 命令来检查当前查询是否使用了物化视图。 - -### 3 更新策略 - -为确保物化视图表与 Base 表的数据一致性,Doris 会将对 Base 表的操作同步到物化视图表中,并采用增量更新的方式提高更新效率,同时通过事务方式保证操作的原子性。 - -### 4 支持的聚合函数 - -1. SUM, MIN, MAX (适用于 Version 0.12) - -2. COUNT, BITMAP_UNION, HLL_UNION (适用于 Version 0.13) - -3. 通用聚合函数 (适用于 Version 2.0) - ## 调优案例 下面通过一个具体例子来阐述单表物化视图的使用方法: diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/index/inverted-index.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/index/inverted-index.md index d4621cd67d..672a13343f 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/index/inverted-index.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/index/inverted-index.md @@ -47,7 +47,7 @@ under the License. - 支持短语查询 `MATCH_PHRASE` - 支持指定词距 `slop` - - 支持短语+前缀 `MATCH_PHRASE_PREFIX` + - 支持短语 + 前缀 `MATCH_PHRASE_PREFIX` - 支持分词正则查询 `MATCH_REGEXP` @@ -175,7 +175,7 @@ table_properties; <details> <summary>ignore_above</summary> - **用于指定不分词字符串索引(没有指定parser)的长度限制** + **用于指定不分词字符串索引(没有指定 parser)的长度限制** <p>- 长度超过 ignore_above 设置的字符串不会被索引。对于字符串数组,ignore_above 将分别应用于每个数组元素,长度超过 ignore_above 的字符串元素将不被索引。</p> <p>- 默认为 256,单位是字节</p> @@ -243,7 +243,7 @@ CANCEL BUILD INDEX ON table_name (job_id1,jobid_2,...); :::tip -`BUILD INDEX` 会生成一个异步任务执行,在每个 BE 上有多个线程执行索引构建任务,通过 BE 参数 `alter_index_worker_count` 可以设置,默认值是3。 +`BUILD INDEX` 会生成一个异步任务执行,在每个 BE 上有多个线程执行索引构建任务,通过 BE 参数 `alter_index_worker_count` 可以设置,默认值是 3。 2.0.12 和 2.1.4 之前的版本 `BUILD INDEX` 会一直重试直到成功,从这两个版本开始通过失败和超时机制避免一直重试。3.0 存算分离模式暂不支持此命令。 @@ -265,7 +265,7 @@ ALTER TABLE table_name DROP INDEX idx_name; :::tip -`DROP INDEX` 会删除索引定义,新写入数据不会再写索引,同时会生成一个异步任务执行索引删除操作,在每个 BE 上有多个线程执行索引构建任务,通过 BE 参数 `alter_index_worker_count` 可以设置,默认值是3。 +`DROP INDEX` 会删除索引定义,新写入数据不会再写索引,同时会生成一个异步任务执行索引删除操作,在每个 BE 上有多个线程执行索引删除任务,通过 BE 参数 `alter_index_worker_count` 可以设置,默认值是 3。 ::: @@ -287,30 +287,30 @@ SELECT * FROM table_name WHERE content MATCH_ALL 'keyword1 keyword2'; -- 2. 全文检索短语匹配,通过 MATCH_PHRASE 完成 -- 2.1 content 列中同时包含 keyword1 和 keyword2 的行,而且 keyword2 必须紧跟在 keyword1 后面 --- 'keyword1 keyword2','wordx keyword1 keyword2','wordx keyword1 keyword2 wordy' 能匹配,因为他们都包含keyword1 keyword2,而且keyword2 紧跟在 keyword1 后面 +-- 'keyword1 keyword2','wordx keyword1 keyword2','wordx keyword1 keyword2 wordy' 能匹配,因为他们都包含 keyword1 keyword2,而且 keyword2 紧跟在 keyword1 后面 -- 'keyword1 wordx keyword2' 不能匹配,因为 keyword1 keyword2 之间隔了一个词 wordx -- 'keyword2 keyword1',因为 keyword1 keyword2 的顺序反了 SELECT * FROM table_name WHERE content MATCH_PHRASE 'keyword1 keyword2'; --- 2.2 content 列中同时包含 keyword1 和 keyword2 的行,而且 keyword1 keyword2 的 `词距`(slop) 不超过3 --- 'keyword1 keyword2', 'keyword1 a keyword2', 'keyword1 a b c keyword2' 都能匹配,因为keyword1 keyword2中间隔的词分别是0 1 3 都不超过3 --- 'keyword1 a b c d keyword2' 不能能匹配,因为keyword1 keyword2中间隔的词有4个,超过3 --- 'keyword2 keyword1', 'keyword2 a keyword1', 'keyword2 a b c keyword1' 也能匹配,因为指定 slop > 0 时不再要求keyword1 keyword2 的顺序。这个行为参考了 ES,与直觉的预期不一样,因此 Doris 提供了在 slop 后面指定正数符号(+)表示需要保持 keyword1 keyword2 的先后顺序 +-- 2.2 content 列中同时包含 keyword1 和 keyword2 的行,而且 keyword1 keyword2 的 `词距`(slop)不超过 3 +-- 'keyword1 keyword2', 'keyword1 a keyword2', 'keyword1 a b c keyword2' 都能匹配,因为 keyword1 keyword2 中间隔的词分别是 0 1 3 都不超过 3 +-- 'keyword1 a b c d keyword2' 不能能匹配,因为 keyword1 keyword2 中间隔的词有 4 个,超过 3 +-- 'keyword2 keyword1', 'keyword2 a keyword1', 'keyword2 a b c keyword1' 也能匹配,因为指定 slop > 0 时不再要求 keyword1 keyword2 的顺序。这个行为参考了 ES,与直觉的预期不一样,因此 Doris 提供了在 slop 后面指定正数符号(+)表示需要保持 keyword1 keyword2 的先后顺序 SELECT * FROM table_name WHERE content MATCH_PHRASE 'keyword1 keyword2 ~3'; -- slop 指定正号,'keyword1 a b c keyword2' 能匹配,而 'keyword2 a b c keyword1' 不能匹配 SELECT * FROM table_name WHERE content MATCH_PHRASE 'keyword1 keyword2 ~3+'; --- 2.3 在保持词顺序的前提下,对最后一个词keyword2做前缀匹配,默认找50个前缀词(session变量inverted_index_max_expansions控制) --- 'keyword1 keyword2abc' 能匹配,因为keyword1完全一样,最后一个 keyword2abc 是 keyword2 的前缀 +-- 2.3 在保持词顺序的前提下,对最后一个词 keyword2 做前缀匹配,默认找 50 个前缀词(session 变量 inverted_index_max_expansions 控制) +-- 'keyword1 keyword2abc' 能匹配,因为 keyword1 完全一样,最后一个 keyword2abc 是 keyword2 的前缀 -- 'keyword1 keyword2' 也能匹配,因为 keyword2 也是 keyword2 的前缀 -- 'keyword1 keyword3' 不能匹配,因为 keyword3 不是 keyword2 的前缀 -- 'keyword1 keyword3abc' 也不能匹配,因为 keyword3abc 也不是 keyword2 的前缀 SELECT * FROM table_name WHERE content MATCH_PHRASE_PREFIX 'keyword1 keyword2'; --- 2.4 如果只填一个词会退化为前缀查询,默认找50个前缀词(session变量inverted_index_max_expansions控制) +-- 2.4 如果只填一个词会退化为前缀查询,默认找 50 个前缀词(session 变量 inverted_index_max_expansions 控制) SELECT * FROM table_name WHERE content MATCH_PHRASE_PREFIX 'keyword1'; --- 2.5 对分词后的词进行正则匹配,默认匹配50个(session变量inverted_index_max_expansions控制) +-- 2.5 对分词后的词进行正则匹配,默认匹配 50 个(session 变量 inverted_index_max_expansions 控制) -- 类似 MATCH_PHRASE_PREFIX 的匹配规则,只是前缀变成了正则 SELECT * FROM table_name WHERE content MATCH_REGEXP 'key*'; diff --git a/versioned_docs/version-2.0/data-operate/delete/truncate-manual.md b/versioned_docs/version-2.0/data-operate/delete/truncate-manual.md index c0fb7f8e26..010a7da74f 100644 --- a/versioned_docs/version-2.0/data-operate/delete/truncate-manual.md +++ b/versioned_docs/version-2.0/data-operate/delete/truncate-manual.md @@ -37,7 +37,7 @@ TRUNCATE TABLE [db.]tbl[ PARTITION(p1, p2, ...)]; - This statement only clears the data within a table or partition but preserves the table or partition itself. - Unlike DELETE, this statement can only clear the specified table or partition as a whole and cannot be added with filter conditions. - Unlike DELETE, truncating data will not affect query performance. -- The data deleted by this operation is not recoverable. +- The data deleted by this operation can be recovered through the RECOVER statement(for a period of time). See [RECOVER](../../../../sql-manual/sql-statements/Database-Administration-Statements/RECOVER) statement for details. If you execute command with FORCE, the data will be deleted directly and cannot be recovered, this operation is generally not recommended. - When using this command, the table status must be NORMAL, which means that tables undergoing SCHEMA CHANGE can not be truncated. - This command may cause ongoing imports to fail. diff --git a/versioned_docs/version-2.0/install/cluster-deployment/k8s-deploy/install-env.md b/versioned_docs/version-2.0/install/cluster-deployment/k8s-deploy/install-env.md index 225d943e6d..045e281c05 100644 --- a/versioned_docs/version-2.0/install/cluster-deployment/k8s-deploy/install-env.md +++ b/versioned_docs/version-2.0/install/cluster-deployment/k8s-deploy/install-env.md @@ -83,6 +83,6 @@ sysctl -w vm.max_map_count=2000000 When deploying Doris, it is recommended to turn off transparent huge pages. ```shell -echo never > /sys/kernel/mm/transparent_hugepage/enabled -echo never > /sys/kernel/mm/transparent_hugepage/defrag +echo madvise > /sys/kernel/mm/transparent_hugepage/enabled +echo madvise > /sys/kernel/mm/transparent_hugepage/defrag ``` diff --git a/versioned_docs/version-2.1/data-operate/delete/truncate-manual.md b/versioned_docs/version-2.1/data-operate/delete/truncate-manual.md index c0fb7f8e26..010a7da74f 100644 --- a/versioned_docs/version-2.1/data-operate/delete/truncate-manual.md +++ b/versioned_docs/version-2.1/data-operate/delete/truncate-manual.md @@ -37,7 +37,7 @@ TRUNCATE TABLE [db.]tbl[ PARTITION(p1, p2, ...)]; - This statement only clears the data within a table or partition but preserves the table or partition itself. - Unlike DELETE, this statement can only clear the specified table or partition as a whole and cannot be added with filter conditions. - Unlike DELETE, truncating data will not affect query performance. -- The data deleted by this operation is not recoverable. +- The data deleted by this operation can be recovered through the RECOVER statement(for a period of time). See [RECOVER](../../../../sql-manual/sql-statements/Database-Administration-Statements/RECOVER) statement for details. If you execute command with FORCE, the data will be deleted directly and cannot be recovered, this operation is generally not recommended. - When using this command, the table status must be NORMAL, which means that tables undergoing SCHEMA CHANGE can not be truncated. - This command may cause ongoing imports to fail. diff --git a/versioned_docs/version-2.1/install/cluster-deployment/k8s-deploy/install-env.md b/versioned_docs/version-2.1/install/cluster-deployment/k8s-deploy/install-env.md index e9e4a99b88..737b4d9ab4 100644 --- a/versioned_docs/version-2.1/install/cluster-deployment/k8s-deploy/install-env.md +++ b/versioned_docs/version-2.1/install/cluster-deployment/k8s-deploy/install-env.md @@ -83,6 +83,6 @@ sysctl -w vm.max_map_count=2000000 When deploying Doris, it is recommended to turn off transparent huge pages. ```shell -echo never > /sys/kernel/mm/transparent_hugepage/enabled -echo never > /sys/kernel/mm/transparent_hugepage/defrag +echo madvise > /sys/kernel/mm/transparent_hugepage/enabled +echo madvise > /sys/kernel/mm/transparent_hugepage/defrag ``` diff --git a/versioned_docs/version-2.1/query-acceleration/tuning/tuning-plan/adjusting-join-shuffle.md b/versioned_docs/version-2.1/query-acceleration/tuning/tuning-plan/adjusting-join-shuffle.md index da26aa8b43..c235b78047 100644 --- a/versioned_docs/version-2.1/query-acceleration/tuning/tuning-plan/adjusting-join-shuffle.md +++ b/versioned_docs/version-2.1/query-acceleration/tuning/tuning-plan/adjusting-join-shuffle.md @@ -28,17 +28,19 @@ under the License. Doris supports the use of hints to adjust the type of data shuffle in Join operations, thereby optimizing query performance. This section will introduce in detail how to use hints in Doris to specify the type of Join shuffle. -Currently, Doris is limited to specifying the Distribute Type for the right table in a Join, offering two types to choose from: `[shuffle]` and `[broadcast]`. The Distribute Type should be placed before the right table in the Join, and can be represented using either square brackets `[]` or double slashes `//`. +Currently, Doris is limited to specifying the Distribute Type for the right table in a Join and provides only two types to choose from: `[shuffle]` and `[broadcast]`. The Distribute Type needs to be placed before the right table in the Join, using square brackets `[]`. + Examples are as follows: ```sql SELECT COUNT(*) FROM t2 JOIN [broadcast] t1 ON t1.c1 = t2.c2; -SELECT COUNT(*) FROM t2 JOIN /*+broadcast*/ t1 ON t1.c1 = t2.c2; +SELECT COUNT(*) FROM t2 JOIN [shuffle] t1 ON t1.c1 = t2.c2; SELECT /*+ ordered */ COUNT(*) FROM t2 JOIN [broadcast] t1 ON t1.c1 = t2.c2; -SELECT /*+ ordered */ COUNT(*) FROM t2 JOIN /+broadcast/ t1 ON t1.c1 = t2.c2; +SELECT /*+ ordered */ COUNT(*) FROM t2 JOIN [shuffle] t1 ON t1.c1 = t2.c2; ``` + When using hints, please note the following: 1. If Doris encounters a DistributeHint that cannot correctly generate an execution plan, it will not display the hint but will make it effective according to the "best effort" principle. Ultimately, the distribute method displayed by EXPLAIN shall be deemed as final. @@ -118,4 +120,8 @@ In the EXPLAIN results, you can see the relevant information for the distribute ## Summary -By properly using DistributeHint, you can optimize the shuffle method of Join operations and improve query performance. In practice, it is recommended to first analyze the query execution plan through EXPLAIN and then choose the appropriate shuffle type based on the actual situation. \ No newline at end of file +By reasonably using DistributeHint, you can optimize the Shuffle method for Join operations and enhance query performance. In practice, it is recommended to first analyze the query execution plan using EXPLAIN and then select the appropriate Shuffle type based on the actual situation. When using it, please pay attention to the following points: + +1. If a DistributeHint fails to correctly generate an execution plan, Doris will not display the Hint but will make it effective based on the principle of "best effort". Ultimately, the Distribute method displayed by EXPLAIN prevails. +2. In the current version, DistributeHint temporarily does not support mixing with LEADING, and the Hint will only take effect when the table specified by Distribute is located on the right side of the Join. +3. It is recommended to mix DistributeHint with ORDERED. First, use ORDERED to fix the Join order, and then specify the expected Distribute method for the corresponding Join. \ No newline at end of file diff --git a/versioned_docs/version-2.1/query-acceleration/tuning/tuning-plan/optimizing-table-index.md b/versioned_docs/version-2.1/query-acceleration/tuning/tuning-plan/optimizing-table-index.md index f8d6160e54..bdf73020a5 100644 --- a/versioned_docs/version-2.1/query-acceleration/tuning/tuning-plan/optimizing-table-index.md +++ b/versioned_docs/version-2.1/query-acceleration/tuning/tuning-plan/optimizing-table-index.md @@ -26,11 +26,11 @@ under the License. Doris currently supports two types of indexes: -1. Built-in Indexes. These include prefix indexes, ZoneMap indexes, etc. +1. Built-in Indexes: These include prefix indexes, ZoneMap indexes, etc. -2. Secondary Indexes. These include inverted indexes, Bloom filter indexes, N-Gram Bloom filter indexes, and Bitmap indexes, etc. +2. Secondary Indexes: These include inverted indexes, Bloom filter indexes, N-Gram Bloom filter indexes, and Bitmap indexes, etc. -In the process of business optimization, fully analyzing business characteristics and effectively utilizing indexes can greatly enhance the effectiveness of queries and analyses, thereby achieving the purpose of performance tuning. +In the process of business optimization, fully analyzing business characteristics and make effective use of indexes can greatly enhance the effectiveness of queries and analyses, thereby achieving the purpose of performance tuning. For a detailed introduction to various indexes, please refer to the [Table Index](../../../table-design/index/index-overview) section. This chapter will demonstrate index usage techniques in several typical scenarios from the perspective of actual cases and summarize optimization suggestions for reference in business tuning. @@ -85,7 +85,7 @@ When defining the schema column order, reference the high-frequency and high-pri Doris supports inverted indexes as secondary indexes to accelerate business scenarios such as equal value, range, and full-text search of text types. The creation and management of inverted indexes are independent, allowing for convenient business performance optimization without affecting the original table schema and without the need to re-import table data. -For typical usage scenarios, syntax, and cases, please refer to the Table Index - Inverted Index section for a detailed introduction, so this chapter will not repeat the explanation. +For typical usage scenarios, syntax, and cases, please refer to the [Table Index - Inverted Index](../../../table-design/index/inverted-index) section for a detailed introduction, so this chapter will not repeat the explanation. :::tip diff --git a/versioned_docs/version-2.1/query-acceleration/tuning/tuning-plan/solving-unevenly-data-distribution.md b/versioned_docs/version-2.1/query-acceleration/tuning/tuning-plan/solving-unevenly-data-distribution.md index 546ec64c45..bd8f25cb3a 100644 --- a/versioned_docs/version-2.1/query-acceleration/tuning/tuning-plan/solving-unevenly-data-distribution.md +++ b/versioned_docs/version-2.1/query-acceleration/tuning/tuning-plan/solving-unevenly-data-distribution.md @@ -53,7 +53,7 @@ When estimating the filter rate, the optimizer typically relies on the assumptio Considering the following SQL query: ```sql -select count() +select count(1) from orders, customer where o_custkey = c_custkey and o_orderdate < '1920-01-02' ``` @@ -67,7 +67,7 @@ To optimize the query, we need to adjust the SQL statement based on the actual s Revising the SQL as follows: ```sql -select /* leading(orders customer) */ count() +select /* +leading(orders customer) */ count(1) from orders, customer where o_custkey = c_custkey and o_orderdate < '1920-01-02' ``` \ No newline at end of file diff --git a/versioned_docs/version-2.1/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-async-mv.md b/versioned_docs/version-2.1/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-async-mv.md index bd1827dcd4..96be04741a 100644 --- a/versioned_docs/version-2.1/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-async-mv.md +++ b/versioned_docs/version-2.1/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-async-mv.md @@ -34,9 +34,11 @@ Next, through an example, we will demonstrate in detail how to use aync-material ### 1 Creation of Base Tables and Data Insertion -First, create two tables, `orders` and `lineitem`, in the TPC-H database, and insert the corresponding data. +First, create two tables, `orders` and `lineitem`, in the tpch database, and insert the corresponding data. ```sql +CREATE DATABASE IF NOT EXISTS tpch; + USE tpch; -- Create the orders table diff --git a/versioned_docs/version-2.1/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-sync-mv.md b/versioned_docs/version-2.1/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-sync-mv.md index 1ea5165bac..564471a277 100644 --- a/versioned_docs/version-2.1/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-sync-mv.md +++ b/versioned_docs/version-2.1/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-sync-mv.md @@ -40,44 +40,6 @@ Materialized views are suitable for the following scenarios: For queries that frequently reuse the same subquery results, a sync-materialized view can significantly enhance performance. Doris automatically maintains the data in the materialized view, ensuring data consistency between the base table and the materialized view without additional manual maintenance costs. During queries, the system automatically matches the optimal materialized view and reads data directly from it. -When using materialized views, please note the following points: - -1. In Doris version 2.0, materialized views have enhanced features. It is recommended that users confirm in a test environment whether the expected queries can hit the desired materialized views before using them in a formal production environment. - -2. It is not advisable to create multiple similar materialized views on the same table, as this may lead to conflicts between them, resulting in query misses. - -## Usage Process - -The usage process for materialized views is as follows: - -### 1 Create a Materialized View - -1. Determine the type of materialized view to create based on the characteristics of the query statement. - -2. Extract the common grouping and aggregation methods from multiple queries as the basis for defining the materialized view. - -3. It is not necessary to create materialized views for all dimension combinations; only create them for commonly used dimension combinations. - -4. Creating a materialized view is an asynchronous operation. After submitting the creation task, Doris will compute the existing data in the background until the creation is successful. - -### 2 Automatic Query Matching - -1. After the materialized view is successfully created, when a user queries the base table, Doris will automatically select an optimal materialized view and read data from it for computation. - -2. Users can use the EXPLAIN command to check whether the current query is using a materialized view. - -### 3 Update Strategy - -To ensure data consistency between the materialized view and the base table, Doris synchronizes operations on the base table to the materialized view, using incremental updates to improve update efficiency and ensuring the atomicity of operations through transactions. - -### 4 Supported Aggregation Functions - -1. SUM, MIN, MAX (applicable to Version 0.12) - -2. COUNT, BITMAP_UNION, HLL_UNION (applicable to Version 0.13) - -3. General aggregation functions (applicable to Version 2.0) - ## Tuning Usage Case The following is a specific example to illustrate the use of single-table materialized views: diff --git a/versioned_docs/version-3.0/data-operate/delete/truncate-manual.md b/versioned_docs/version-3.0/data-operate/delete/truncate-manual.md index c0fb7f8e26..010a7da74f 100644 --- a/versioned_docs/version-3.0/data-operate/delete/truncate-manual.md +++ b/versioned_docs/version-3.0/data-operate/delete/truncate-manual.md @@ -37,7 +37,7 @@ TRUNCATE TABLE [db.]tbl[ PARTITION(p1, p2, ...)]; - This statement only clears the data within a table or partition but preserves the table or partition itself. - Unlike DELETE, this statement can only clear the specified table or partition as a whole and cannot be added with filter conditions. - Unlike DELETE, truncating data will not affect query performance. -- The data deleted by this operation is not recoverable. +- The data deleted by this operation can be recovered through the RECOVER statement(for a period of time). See [RECOVER](../../../../sql-manual/sql-statements/Database-Administration-Statements/RECOVER) statement for details. If you execute command with FORCE, the data will be deleted directly and cannot be recovered, this operation is generally not recommended. - When using this command, the table status must be NORMAL, which means that tables undergoing SCHEMA CHANGE can not be truncated. - This command may cause ongoing imports to fail. diff --git a/versioned_docs/version-3.0/install/cluster-deployment/k8s-deploy/install-env.md b/versioned_docs/version-3.0/install/cluster-deployment/k8s-deploy/install-env.md index 8c16a4a1cb..e9722715bd 100644 --- a/versioned_docs/version-3.0/install/cluster-deployment/k8s-deploy/install-env.md +++ b/versioned_docs/version-3.0/install/cluster-deployment/k8s-deploy/install-env.md @@ -83,6 +83,6 @@ sysctl -w vm.max_map_count=2000000 When deploying Doris, it is recommended to turn off transparent huge pages. ```shell -echo never > /sys/kernel/mm/transparent_hugepage/enabled -echo never > /sys/kernel/mm/transparent_hugepage/defrag +echo madvise > /sys/kernel/mm/transparent_hugepage/enabled +echo madvise > /sys/kernel/mm/transparent_hugepage/defrag ``` diff --git a/versioned_docs/version-3.0/query-acceleration/tuning/tuning-plan/adjusting-join-shuffle.md b/versioned_docs/version-3.0/query-acceleration/tuning/tuning-plan/adjusting-join-shuffle.md index da26aa8b43..c235b78047 100644 --- a/versioned_docs/version-3.0/query-acceleration/tuning/tuning-plan/adjusting-join-shuffle.md +++ b/versioned_docs/version-3.0/query-acceleration/tuning/tuning-plan/adjusting-join-shuffle.md @@ -28,17 +28,19 @@ under the License. Doris supports the use of hints to adjust the type of data shuffle in Join operations, thereby optimizing query performance. This section will introduce in detail how to use hints in Doris to specify the type of Join shuffle. -Currently, Doris is limited to specifying the Distribute Type for the right table in a Join, offering two types to choose from: `[shuffle]` and `[broadcast]`. The Distribute Type should be placed before the right table in the Join, and can be represented using either square brackets `[]` or double slashes `//`. +Currently, Doris is limited to specifying the Distribute Type for the right table in a Join and provides only two types to choose from: `[shuffle]` and `[broadcast]`. The Distribute Type needs to be placed before the right table in the Join, using square brackets `[]`. + Examples are as follows: ```sql SELECT COUNT(*) FROM t2 JOIN [broadcast] t1 ON t1.c1 = t2.c2; -SELECT COUNT(*) FROM t2 JOIN /*+broadcast*/ t1 ON t1.c1 = t2.c2; +SELECT COUNT(*) FROM t2 JOIN [shuffle] t1 ON t1.c1 = t2.c2; SELECT /*+ ordered */ COUNT(*) FROM t2 JOIN [broadcast] t1 ON t1.c1 = t2.c2; -SELECT /*+ ordered */ COUNT(*) FROM t2 JOIN /+broadcast/ t1 ON t1.c1 = t2.c2; +SELECT /*+ ordered */ COUNT(*) FROM t2 JOIN [shuffle] t1 ON t1.c1 = t2.c2; ``` + When using hints, please note the following: 1. If Doris encounters a DistributeHint that cannot correctly generate an execution plan, it will not display the hint but will make it effective according to the "best effort" principle. Ultimately, the distribute method displayed by EXPLAIN shall be deemed as final. @@ -118,4 +120,8 @@ In the EXPLAIN results, you can see the relevant information for the distribute ## Summary -By properly using DistributeHint, you can optimize the shuffle method of Join operations and improve query performance. In practice, it is recommended to first analyze the query execution plan through EXPLAIN and then choose the appropriate shuffle type based on the actual situation. \ No newline at end of file +By reasonably using DistributeHint, you can optimize the Shuffle method for Join operations and enhance query performance. In practice, it is recommended to first analyze the query execution plan using EXPLAIN and then select the appropriate Shuffle type based on the actual situation. When using it, please pay attention to the following points: + +1. If a DistributeHint fails to correctly generate an execution plan, Doris will not display the Hint but will make it effective based on the principle of "best effort". Ultimately, the Distribute method displayed by EXPLAIN prevails. +2. In the current version, DistributeHint temporarily does not support mixing with LEADING, and the Hint will only take effect when the table specified by Distribute is located on the right side of the Join. +3. It is recommended to mix DistributeHint with ORDERED. First, use ORDERED to fix the Join order, and then specify the expected Distribute method for the corresponding Join. \ No newline at end of file diff --git a/versioned_docs/version-3.0/query-acceleration/tuning/tuning-plan/optimizing-table-index.md b/versioned_docs/version-3.0/query-acceleration/tuning/tuning-plan/optimizing-table-index.md index f8d6160e54..bdf73020a5 100644 --- a/versioned_docs/version-3.0/query-acceleration/tuning/tuning-plan/optimizing-table-index.md +++ b/versioned_docs/version-3.0/query-acceleration/tuning/tuning-plan/optimizing-table-index.md @@ -26,11 +26,11 @@ under the License. Doris currently supports two types of indexes: -1. Built-in Indexes. These include prefix indexes, ZoneMap indexes, etc. +1. Built-in Indexes: These include prefix indexes, ZoneMap indexes, etc. -2. Secondary Indexes. These include inverted indexes, Bloom filter indexes, N-Gram Bloom filter indexes, and Bitmap indexes, etc. +2. Secondary Indexes: These include inverted indexes, Bloom filter indexes, N-Gram Bloom filter indexes, and Bitmap indexes, etc. -In the process of business optimization, fully analyzing business characteristics and effectively utilizing indexes can greatly enhance the effectiveness of queries and analyses, thereby achieving the purpose of performance tuning. +In the process of business optimization, fully analyzing business characteristics and make effective use of indexes can greatly enhance the effectiveness of queries and analyses, thereby achieving the purpose of performance tuning. For a detailed introduction to various indexes, please refer to the [Table Index](../../../table-design/index/index-overview) section. This chapter will demonstrate index usage techniques in several typical scenarios from the perspective of actual cases and summarize optimization suggestions for reference in business tuning. @@ -85,7 +85,7 @@ When defining the schema column order, reference the high-frequency and high-pri Doris supports inverted indexes as secondary indexes to accelerate business scenarios such as equal value, range, and full-text search of text types. The creation and management of inverted indexes are independent, allowing for convenient business performance optimization without affecting the original table schema and without the need to re-import table data. -For typical usage scenarios, syntax, and cases, please refer to the Table Index - Inverted Index section for a detailed introduction, so this chapter will not repeat the explanation. +For typical usage scenarios, syntax, and cases, please refer to the [Table Index - Inverted Index](../../../table-design/index/inverted-index) section for a detailed introduction, so this chapter will not repeat the explanation. :::tip diff --git a/versioned_docs/version-3.0/query-acceleration/tuning/tuning-plan/solving-unevenly-data-distribution.md b/versioned_docs/version-3.0/query-acceleration/tuning/tuning-plan/solving-unevenly-data-distribution.md index 546ec64c45..bd8f25cb3a 100644 --- a/versioned_docs/version-3.0/query-acceleration/tuning/tuning-plan/solving-unevenly-data-distribution.md +++ b/versioned_docs/version-3.0/query-acceleration/tuning/tuning-plan/solving-unevenly-data-distribution.md @@ -53,7 +53,7 @@ When estimating the filter rate, the optimizer typically relies on the assumptio Considering the following SQL query: ```sql -select count() +select count(1) from orders, customer where o_custkey = c_custkey and o_orderdate < '1920-01-02' ``` @@ -67,7 +67,7 @@ To optimize the query, we need to adjust the SQL statement based on the actual s Revising the SQL as follows: ```sql -select /* leading(orders customer) */ count() +select /* +leading(orders customer) */ count(1) from orders, customer where o_custkey = c_custkey and o_orderdate < '1920-01-02' ``` \ No newline at end of file diff --git a/versioned_docs/version-3.0/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-async-mv.md b/versioned_docs/version-3.0/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-async-mv.md index bd1827dcd4..96be04741a 100644 --- a/versioned_docs/version-3.0/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-async-mv.md +++ b/versioned_docs/version-3.0/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-async-mv.md @@ -34,9 +34,11 @@ Next, through an example, we will demonstrate in detail how to use aync-material ### 1 Creation of Base Tables and Data Insertion -First, create two tables, `orders` and `lineitem`, in the TPC-H database, and insert the corresponding data. +First, create two tables, `orders` and `lineitem`, in the tpch database, and insert the corresponding data. ```sql +CREATE DATABASE IF NOT EXISTS tpch; + USE tpch; -- Create the orders table diff --git a/versioned_docs/version-3.0/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-sync-mv.md b/versioned_docs/version-3.0/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-sync-mv.md index 1ea5165bac..564471a277 100644 --- a/versioned_docs/version-3.0/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-sync-mv.md +++ b/versioned_docs/version-3.0/query-acceleration/tuning/tuning-plan/transparent-rewriting-with-sync-mv.md @@ -40,44 +40,6 @@ Materialized views are suitable for the following scenarios: For queries that frequently reuse the same subquery results, a sync-materialized view can significantly enhance performance. Doris automatically maintains the data in the materialized view, ensuring data consistency between the base table and the materialized view without additional manual maintenance costs. During queries, the system automatically matches the optimal materialized view and reads data directly from it. -When using materialized views, please note the following points: - -1. In Doris version 2.0, materialized views have enhanced features. It is recommended that users confirm in a test environment whether the expected queries can hit the desired materialized views before using them in a formal production environment. - -2. It is not advisable to create multiple similar materialized views on the same table, as this may lead to conflicts between them, resulting in query misses. - -## Usage Process - -The usage process for materialized views is as follows: - -### 1 Create a Materialized View - -1. Determine the type of materialized view to create based on the characteristics of the query statement. - -2. Extract the common grouping and aggregation methods from multiple queries as the basis for defining the materialized view. - -3. It is not necessary to create materialized views for all dimension combinations; only create them for commonly used dimension combinations. - -4. Creating a materialized view is an asynchronous operation. After submitting the creation task, Doris will compute the existing data in the background until the creation is successful. - -### 2 Automatic Query Matching - -1. After the materialized view is successfully created, when a user queries the base table, Doris will automatically select an optimal materialized view and read data from it for computation. - -2. Users can use the EXPLAIN command to check whether the current query is using a materialized view. - -### 3 Update Strategy - -To ensure data consistency between the materialized view and the base table, Doris synchronizes operations on the base table to the materialized view, using incremental updates to improve update efficiency and ensuring the atomicity of operations through transactions. - -### 4 Supported Aggregation Functions - -1. SUM, MIN, MAX (applicable to Version 0.12) - -2. COUNT, BITMAP_UNION, HLL_UNION (applicable to Version 0.13) - -3. General aggregation functions (applicable to Version 2.0) - ## Tuning Usage Case The following is a specific example to illustrate the use of single-table materialized views: --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org