This is an automated email from the ASF dual-hosted git repository.
panxiaolei pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git
The following commit(s) were added to refs/heads/master by this push:
new 8d16f1bb09 [Chore](materialized-view) update documentation about
materialized-view and update test (#22350)
8d16f1bb09 is described below
commit 8d16f1bb097ef0bd52a1d08e10d74f0f5301f10a
Author: Pxl <[email protected]>
AuthorDate: Tue Aug 1 15:13:34 2023 +0800
[Chore](materialized-view) update documentation about materialized-view and
update test (#22350)
update documentation about materialized-view and update test
---
.../docs/query-acceleration/materialized-view.md | 22 ++++---
.../docs/query-acceleration/materialized-view.md | 21 ++++---
.../data/mv_p0/test_doc_e4/test_doc_e4.out | 19 ++++++
.../suites/mv_p0/test_doc_e4/test_doc_e4.groovy | 67 ++++++++++++++++++++++
4 files changed, 106 insertions(+), 23 deletions(-)
diff --git a/docs/en/docs/query-acceleration/materialized-view.md
b/docs/en/docs/query-acceleration/materialized-view.md
index b4d50cd2a0..b7f06ff305 100644
--- a/docs/en/docs/query-acceleration/materialized-view.md
+++ b/docs/en/docs/query-acceleration/materialized-view.md
@@ -85,8 +85,7 @@ In `Doris 2.0` we made some enhancements to materialized
views (described in `Be
If you don't know how to verify that a query hits a materialized view, you can
read `Best Practice 1` of this article.
-At the same time, we do not recommend that users create multiple materialized
views with similar shapes on the same table, as this may cause conflicts
between multiple materialized views and cause query hit failures. (Of course,
these possible problems can be verified in the test environment)
-
+At the same time, we do not recommend that users create multiple materialized
views with similar shapes on the same table, which may cause conflicts between
multiple materialized views and cause query hit failures (this problem will be
improved in the new optimizer ). It is recommended that users first verify
whether materialized views and queries meet the requirements and can be used
normally in the test environment.
### Support aggregate functions
The aggregate functions currently supported by the materialized view function
are:
@@ -489,7 +488,7 @@ This problem can be solved by creating a materialized view
with k3 as the first
<version since="2.0.0"></version>
-In `Doris 2.0`, we have made some enhancements to the expressions supported by
the materialized view. This example will mainly reflect the support for various
expressions of the new version of the materialized view.
+In `Doris 2.0`, we have made some enhancements to the expressions supported by
the materialized view. This example will mainly reflect the support and early
filtering of the new version of the materialized view for various expressions.
1. Create a base table and insert some data.
```sql
@@ -497,7 +496,7 @@ In `Doris 2.0`, we have made some enhancements to the
expressions supported by t
k1 int null,
k2 int not null,
k3 bigint null,
- k4 varchar(100) null
+ k4 date null
)
duplicate key (k1,k2,k3)
distributed BY hash(k1) buckets 3
@@ -511,24 +510,23 @@ In `Doris 2.0`, we have made some enhancements to the
expressions supported by t
2. Create some materialized views.
```sql
create materialized view k1a2p2ap3ps as select
abs(k1)+k2+1,sum(abs(k2+2)+k3+3) from d_table group by abs(k1)+k2+1;
- create materialized view kymd as select year(k4),month(k4),day(k4) from
d_table;
+ create materialized view kymd as select year(k4),month(k4) from d_table
where year(k4) = 2020; // Filter with where expression in advance to reduce the
amount of data in the materialized view.
```
3. Use some queries to test if the materialized view was successfully hit.
```sql
select abs(k1)+k2+1, sum(abs(k2+2)+k3+3) from d_table group by
abs(k1)+k2+1; // hit k1a2p2ap3ps
select bin(abs(k1)+k2+1), sum(abs(k2+2)+k3+3) from d_table group by
bin(abs(k1)+k2+1); // hit k1a2p2ap3ps
- select year(k4),month(k4),day(k4) from d_table; // hit kymd
- select year(k4)+month(k4)+day(k4) from d_table where year(k4) = 2020; //
hit kymd
+ select year(k4),month(k4),day(k4) from d_table; // cannot hit the
materialized view because the where condition does not match
+ select year(k4)+month(k4) from d_table where year(k4) = 2020; // hit kymd
```
## Limitations
-1. The parameter of the aggregate function of the materialized view does not
support the expression only supports a single column, for example: sum(a+b)
does not support. (Supported after 2.0)
-2. If the conditional column of the delete statement does not exist in the
materialized view, the delete operation cannot be performed. If you must delete
data, you need to delete the materialized view before deleting the data.
-3. Too many materialized views on a single table will affect the efficiency of
importing: When importing data, the materialized view and base table data are
updated synchronously. If a table has more than 10 materialized view tables, it
may cause the import speed to be very high. slow. This is the same as a single
import needs to import 10 tables at the same time.
-4. The same column with different aggregate functions cannot appear in a
materialized view at the same time. For example, select sum(a), min(a) from
table are not supported. (Supported after 2.0)
-5. For the Unique Key data model, the materialized view can only change the
column order and cannot play the role of aggregation. Therefore, in the Unique
Key model, it is not possible to perform coarse-grained aggregation operations
on the data by creating a materialized view.
+1. If the condition column of the delete statement does not exist in the
materialized view, the delete operation cannot be performed. If you must delete
the data, you need to delete the materialized view before deleting the data.
+2. Too many materialized views on a single table will affect the efficiency of
import: when importing data, the materialized view and Base table data are
updated synchronously. If a table has more than 10 materialized views, the
import speed may be slow. slow. This is the same as if a single import needs to
import 10 table data at the same time.
+3. For the Unique Key data model, the materialized view can only change the
order of the columns and cannot perform aggregation. Therefore, it is not
possible to perform coarse-grained aggregation operations on the data by
creating a materialized view on the Unique Key model.
+4. At present, the rewriting behavior of some optimizers to SQL may cause the
materialized view to fail to be hit. For example, k1+1-1 is rewritten as k1,
between is rewritten as <= and >=, and day is rewritten as dayofmonth. In this
case, you need to manually adjust the statements of the query and materialized
view.
## Error
1. DATA_QUALITY_ERROR: "The data quality does not satisfy, please check your
data"
diff --git a/docs/zh-CN/docs/query-acceleration/materialized-view.md
b/docs/zh-CN/docs/query-acceleration/materialized-view.md
index a024711f9a..410ca2cdfc 100644
--- a/docs/zh-CN/docs/query-acceleration/materialized-view.md
+++ b/docs/zh-CN/docs/query-acceleration/materialized-view.md
@@ -83,7 +83,7 @@ Doris 系统提供了一整套对物化视图的 DDL 语法,包括创建,查
如果不清楚如何验证一个查询是否命中物化视图,可以阅读本文的`最佳实践1`。
-与此同时,我们不建议用户在同一张表上建多个形态类似的物化视图,这可能会导致多个物化视图之间的冲突使得查询命中失败。(当然,这些可能出现的问题都可以在测试环境中验证)
+与此同时,我们不建议用户在同一张表上建多个形态类似的物化视图,这可能会导致多个物化视图之间的冲突使得查询命中失败(在新优化器中这个问题会有所改善)。建议用户先在测试环境中验证物化视图和查询是否满足需求并能正常使用。
### 支持聚合函数
@@ -487,7 +487,7 @@ MySQL [test]> desc advertiser_view_record;
<version since="2.0.0"></version>
-在`Doris 2.0`中,我们对物化视图所支持的表达式做了一些增强,本示例将主要体现新版本物化视图对各种表达式的支持。
+在`Doris 2.0`中,我们对物化视图所支持的表达式做了一些增强,本示例将主要体现新版本物化视图对各种表达式的支持和提前过滤。
1. 创建一个 Base 表并插入一些数据。
```sql
@@ -495,7 +495,7 @@ create table d_table (
k1 int null,
k2 int not null,
k3 bigint null,
- k4 varchar(100) null
+ k4 date null
)
duplicate key (k1,k2,k3)
distributed BY hash(k1) buckets 3
@@ -509,24 +509,23 @@ insert into d_table select 3,-3,null,'2022-02-20';
2. 创建一些物化视图。
```sql
create materialized view k1a2p2ap3ps as select
abs(k1)+k2+1,sum(abs(k2+2)+k3+3) from d_table group by abs(k1)+k2+1;
-create materialized view kymd as select year(k4),month(k4),day(k4) from
d_table;
+create materialized view kymd as select year(k4),month(k4) from d_table where
year(k4) = 2020; // 提前用where表达式过滤以减少物化视图中的数据量。
```
3. 用一些查询测试是否成功命中物化视图。
```sql
select abs(k1)+k2+1,sum(abs(k2+2)+k3+3) from d_table group by abs(k1)+k2+1; //
命中k1a2p2ap3ps
select bin(abs(k1)+k2+1),sum(abs(k2+2)+k3+3) from d_table group by
bin(abs(k1)+k2+1); // 命中k1a2p2ap3ps
-select year(k4),month(k4),day(k4) from d_table; // 命中kymd
-select year(k4)+month(k4)+day(k4) from d_table where year(k4) = 2020; // 命中kymd
+select year(k4),month(k4) from d_table; // 无法命中物化视图,因为where条件不匹配
+select year(k4)+month(k4) from d_table where year(k4) = 2020; // 命中kymd
```
## 局限性
-1. 物化视图的聚合函数的参数不支持表达式仅支持单列,比如: sum(a+b) 不支持。(2.0 后支持)
-2. 如果删除语句的条件列,在物化视图中不存在,则不能进行删除操作。如果一定要删除数据,则需要先将物化视图删除,然后方可删除数据。
-3. 单表上过多的物化视图会影响导入的效率:导入数据时,物化视图和 Base 表数据是同步更新的,如果一张表的物化视图表超过 10
张,则有可能导致导入速度很慢。这就像单次导入需要同时导入 10 张表数据是一样的。
-4. 相同列,不同聚合函数,不能同时出现在一张物化视图中,比如:select sum(a), min(a) from table 不支持。(2.0后支持)
-5. 物化视图针对 Unique Key数据模型,只能改变列顺序,不能起到聚合的作用,所以在Unique
Key模型上不能通过创建物化视图的方式对数据进行粗粒度聚合操作
+1. 如果删除语句的条件列,在物化视图中不存在,则不能进行删除操作。如果一定要删除数据,则需要先将物化视图删除,然后方可删除数据。
+2. 单表上过多的物化视图会影响导入的效率:导入数据时,物化视图和 Base 表数据是同步更新的,如果一张表的物化视图表超过 10
张,则有可能导致导入速度很慢。这就像单次导入需要同时导入 10 张表数据是一样的。
+3. 物化视图针对 Unique Key数据模型,只能改变列顺序,不能起到聚合的作用,所以在Unique
Key模型上不能通过创建物化视图的方式对数据进行粗粒度聚合操作
+4.
目前一些优化器对sql的改写行为可能会导致物化视图无法被命中,例如k1+1-1被改写成k1,between被改写成<=和>=,day被改写成dayofmonth,遇到这种情况需要手动调整下查询和物化视图的语句。
## 异常错误
diff --git a/regression-test/data/mv_p0/test_doc_e4/test_doc_e4.out
b/regression-test/data/mv_p0/test_doc_e4/test_doc_e4.out
new file mode 100644
index 0000000000..cf979564f6
--- /dev/null
+++ b/regression-test/data/mv_p0/test_doc_e4/test_doc_e4.out
@@ -0,0 +1,19 @@
+-- This file is automatically generated. You should know what you did if you
want to edit this
+-- !select_mv --
+1 \N
+3 7
+5 9
+
+-- !select_mv --
+1 \N
+101 9
+11 7
+
+-- !select_mv --
+2020 2
+2021 2
+2022 2
+
+-- !select_mv --
+2022
+
diff --git a/regression-test/suites/mv_p0/test_doc_e4/test_doc_e4.groovy
b/regression-test/suites/mv_p0/test_doc_e4/test_doc_e4.groovy
new file mode 100644
index 0000000000..c64d2db79c
--- /dev/null
+++ b/regression-test/suites/mv_p0/test_doc_e4/test_doc_e4.groovy
@@ -0,0 +1,67 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+import org.codehaus.groovy.runtime.IOGroovyMethods
+
+suite ("test_doc_e4") {
+
+ sql """ DROP TABLE IF EXISTS d_table; """
+
+ sql """
+ create table d_table (
+ k1 int null,
+ k2 int not null,
+ k3 bigint null,
+ k4 date null
+ )
+ duplicate key (k1,k2,k3)
+ distributed BY hash(k1) buckets 3
+ properties("replication_num" = "1");
+ """
+
+ sql "insert into d_table select 1,1,1,'2020-02-20';"
+ sql "insert into d_table select 2,2,2,'2021-02-20';"
+
+ createMV ("create materialized view k1a2p2ap3ps as select
abs(k1)+k2+1,sum(abs(k2+2)+k3+3) from d_table group by abs(k1)+k2+1;")
+ createMV ("create materialized view kymd as select year(k4),month(k4) from
d_table where year(k4) = 2020;")
+
+ sql "insert into d_table select 3,-3,null,'2022-02-20';"
+
+ explain {
+ sql("select abs(k1)+k2+1,sum(abs(k2+2)+k3+3) from d_table group by
abs(k1)+k2+1 order by 1,2;")
+ contains "(k1a2p2ap3ps)"
+ }
+ qt_select_mv "select abs(k1)+k2+1,sum(abs(k2+2)+k3+3) from d_table group
by abs(k1)+k2+1 order by 1,2;"
+
+ explain {
+ sql("select bin(abs(k1)+k2+1),sum(abs(k2+2)+k3+3) from d_table group
by bin(abs(k1)+k2+1);")
+ contains "(k1a2p2ap3ps)"
+ }
+ qt_select_mv "select bin(abs(k1)+k2+1),sum(abs(k2+2)+k3+3) from d_table
group by bin(abs(k1)+k2+1) order by 1,2;"
+
+ explain {
+ sql("select year(k4),month(k4) from d_table;")
+ contains "(d_table)"
+ }
+ qt_select_mv "select year(k4),month(k4) from d_table order by 1,2;"
+
+ explain {
+ sql("select year(k4)+month(k4) from d_table where year(k4) = 2020;")
+ contains "(kymd)"
+ }
+ qt_select_mv "select year(k4)+month(k4) from d_table where year(k4) = 2020
order by 1;"
+}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]