This is an automated email from the ASF dual-hosted git repository.

jeffreyh pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git


The following commit(s) were added to refs/heads/master by this push:
     new cbbf5e845dc [mtmv]mtmv support datalake (#2791)
cbbf5e845dc is described below

commit cbbf5e845dc2d4c5e8b24bc1a635b3f20e4dcf56
Author: zhangdong <[email protected]>
AuthorDate: Fri Oct 17 10:23:33 2025 +0800

    [mtmv]mtmv support datalake (#2791)
    
    ## Versions
    
    - [x] dev
    - [x] 3.0
    - [ ] 2.1
    - [ ] 2.0
    
    ## Languages
    
    - [x] Chinese
    - [x] English
    
    ## Docs Checklist
    
    - [ ] Checked by AI
    - [ ] Test Cases Built
---
 .../async-materialized-view/overview.md            | 22 +++++++++++++++-------
 .../async-materialized-view/overview.md            | 21 ++++++++++++++-------
 .../async-materialized-view/overview.md            | 21 ++++++++++++++-------
 .../async-materialized-view/overview.md            | 22 +++++++++++++++-------
 4 files changed, 58 insertions(+), 28 deletions(-)

diff --git 
a/docs/query-acceleration/materialized-view/async-materialized-view/overview.md 
b/docs/query-acceleration/materialized-view/async-materialized-view/overview.md
index 3c6014391f8..c5d93b4907e 100644
--- 
a/docs/query-acceleration/materialized-view/async-materialized-view/overview.md
+++ 
b/docs/query-acceleration/materialized-view/async-materialized-view/overview.md
@@ -42,7 +42,15 @@ Transparent rewriting is an important means for databases to 
optimize query perf
 
 Doris asynchronous materialized views utilize a transparent rewriting 
algorithm based on the SPJG (SELECT-PROJECT-JOIN-GROUP-BY) model. This 
algorithm can deeply analyze the structural information of SQL, automatically 
searching for and selecting suitable materialized views for transparent 
rewriting. When multiple materialized views are available, the algorithm will 
also choose the optimal materialized view to respond to the query SQL based on 
certain strategies (such as cost models), fu [...]
 
-## Support for Materialized Refresh Data Lake
+## Creating Asynchronous Materialized Views Based on Data Lakes
+The syntax for creating asynchronous materialized views based on data lakes is 
exactly the same as that for creating asynchronous materialized views based on 
internal tables, but there are some considerations:
+- Refreshing materialized views requires metadata from the data lake, such as 
partition version information. This information is obtained from the metadata 
cache in the data lake rather than directly from the external environment. 
Therefore, after the materialized view is refreshed, the data remains 
consistent with the results queried from the data lake through Doris. However, 
it may not match the results queried from the data lake through other engines, 
depending on the refresh status o [...]
+- If the underlying Hive data is modified by an external process not 
controlled by Doris (such as Spark, Hive, or Flink jobs) without changing the 
metadata (e.g., executing insert overwrite), the materialized view may assume 
consistency with the base table data, but the queried data may not match the 
results queried from the data lake through Doris. This issue can be resolved by 
manually forcing a refresh of the materialized view.
+- When creating partitioned materialized views based on Iceberg, only Iceberg 
tables with a single partition column are supported. Limited support is 
provided for partition evolution. For example, changes to the time range of a 
time-based partition are supported, but changes to the partition field are not. 
If the partition field is modified, the materialized view refresh will fail.
+- When creating materialized views based on Hudi, there is no awareness of 
whether the base table data has changed. Therefore, once the materialized view 
(or a partition of the materialized view) has been refreshed, it is considered 
synchronized with the base table. As a result, creating materialized views 
based on Hudi is only suitable for scenarios requiring manual on-demand refresh.
+
+
+### Support for Materialized Refresh Data Lake
 
 The support for materialized refresh data lakes varies by table type and 
catalog.
 
@@ -76,21 +84,21 @@ The support for materialized refresh data lakes varies by 
table type and catalog
         <td>Iceberg</td>
         <td>Iceberg</td>
         <td>Supported in 2.1</td>
-        <td>Not supported</td>
+        <td>Supported in 3.1</td>
         <td>Not supported</td>
     </tr>
     <tr>
         <td>Paimon</td>
         <td>Paimon</td>
         <td>Supported in 2.1</td>
-        <td>Not supported</td>
+        <td>Supported in 3.1</td>
         <td>Not supported</td>
     </tr>
     <tr>
         <td>Hudi</td>
         <td>Hudi</td>
         <td>Supported in 2.1</td>
-        <td>Not supported</td>
+        <td>Supported in 3.1</td>
         <td>Not supported</td>
     </tr>
     <tr>
@@ -109,7 +117,7 @@ The support for materialized refresh data lakes varies by 
table type and catalog
     </tr>
 </table>
 
-## Transparent Rewriting Support for Data Lake
+### Transparent Rewriting Support for Data Lake
 Currently, the transparent rewriting feature of asynchronous materialized 
views supports the following types of tables and catalogs.
 
 Real-time Base Table Data Awareness: Refers to the materialized view's ability 
to detect changes in the underlying table data it uses and utilize the latest 
data during queries.
@@ -149,7 +157,7 @@ Real-time Base Table Data Awareness: Refers to the 
materialized view's ability t
         <td>Hudi</td>
         <td>Hudi</td>
         <td>Supported</td>
-        <td>3.1 Supported</td>
+        <td>Not supported</td>
     </tr>
     <tr>
         <td>JDBC</td>
@@ -165,7 +173,7 @@ Real-time Base Table Data Awareness: Refers to the 
materialized view's ability t
     </tr>
 </table>
 
-Materialized views using external tables do not participate in transparent 
rewriting by default, because they cannot detect changes in external table data 
and cannot guarantee the data in the materialized view is up-to-date.
+Materialized views using external tables do not participate in transparent 
rewriting by default.
 If you want to enable transparent rewriting for materialized views containing 
external tables, you can set `SET 
materialized_view_rewrite_enable_contain_external_table = true`.
 
 Since version 2.1.11, Doris has optimized the transparent rewriting 
performance for external tables, mainly improving the performance of obtaining 
available materialized views containing external tables.
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/materialized-view/async-materialized-view/overview.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/materialized-view/async-materialized-view/overview.md
index a8f33bd423a..5640fa7b226 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/materialized-view/async-materialized-view/overview.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/materialized-view/async-materialized-view/overview.md
@@ -46,7 +46,14 @@ Doris 异步物化视图采用了基于 SPJG(SELECT-PROJECT-JOIN-GROUP-BY)
 算法还会根据一定的策略(如成本模型)选择最优的物化视图来响应查询 SQL,从而进一步提升查询性能。
 
 
-## 物化刷新数据湖支持情况
+## 基于数据湖创建异步物化视图
+基于数据湖创建异步物化视图的语法和基于内表创建异步物化视图的语法完全一样,但有一些注意事项:
+- 
物化视图刷新需要使用到数据湖的元数据,例如分区版本信息,这些信息是从数据湖中的元数据缓存获取的,并不是直接从外部环境中获取,因此物化视图刷新完成后,数据和通过 
Doris 查询数据湖的结果保持一致,但有可能和通过其它引擎查询数据湖的结果不一致,取决于缓存的刷新情况
+- 如果 Hive 底层数据通过非 Doris 控制的外部流程(如Spark、Hive或Flink作业)发生了变更,但是却没有改变元数据,例如执行了 
`insert overwrite`,会导致物化视图认为和基表数据一致,但是查询出的数据和通过 Doris 
查询数据湖的结果不一致,可以手动强制刷新物化视图来解决此问题
+- 基于 iceberg 创建分区物化视图,仅支持 iceberg 
表分区列仅有一列的情况,并且有限度的支持分区演进功能,例如时间类型的分区时间范围发生了变化是支持的,如果分区字段发生了变化是不支持的,物化视图刷新会失败
+- 基于 hudi 
创建物化视图,无法感知到基表数据是否发生了变化,所以只要刷新过物化视图(或物化视图的部分分区),就会认为物化视图(或这些分区)和基表是同步的,所以基于 
hudi 创建物化视图仅适用于手动按需刷新的场景
+
+### 物化刷新数据湖支持情况
 
 物化刷新数据湖的支持情况,不同类型的表和 Catalog 有不同的支持程度
 
@@ -80,21 +87,21 @@ Doris 异步物化视图采用了基于 SPJG(SELECT-PROJECT-JOIN-GROUP-BY)
         <td>Iceberg</td>
         <td>Iceberg</td>
         <td>2.1 支持</td>
-        <td>不支持</td>
+        <td>3.1 支持</td>
         <td>不支持</td>
     </tr>
     <tr>
         <td>Paimon</td>
         <td>Paimon</td>
         <td>2.1 支持</td>
-        <td>不支持</td>
+        <td>3.1 支持</td>
         <td>不支持</td>
     </tr>
     <tr>
         <td>Hudi</td>
         <td>Hudi</td>
         <td>2.1 支持</td>
-        <td>不支持</td>
+        <td>3.1 支持</td>
         <td>不支持</td>
     </tr>
     <tr>
@@ -113,7 +120,7 @@ Doris 异步物化视图采用了基于 SPJG(SELECT-PROJECT-JOIN-GROUP-BY)
     </tr>
 </table>
 
-## 透明改写数据湖支持情况
+### 透明改写数据湖支持情况
 
 目前,异步物化视图的透明改写功能支持以下类型的表和 Catalog。
 
@@ -155,7 +162,7 @@ Doris 异步物化视图采用了基于 SPJG(SELECT-PROJECT-JOIN-GROUP-BY)
         <td>Hudi</td>
         <td>Hudi</td>
         <td>支持</td>
-        <td>3.1 支持</td>
+        <td>不支持</td>
     </tr>
     <tr>
         <td>JDBC</td>
@@ -171,7 +178,7 @@ Doris 异步物化视图采用了基于 SPJG(SELECT-PROJECT-JOIN-GROUP-BY)
     </tr>
 </table>
 
-物化视图使用外表,此物化视图默认是不参与透明改写的,因为因为无法感知到外表数据的变化,无法保证物化视图中的数据是最新的。
+物化视图使用外表,此物化视图默认是不参与透明改写的。
 如果想要使用外表的物化视图参与透明改写,可以通过设置 `SET 
materialized_view_rewrite_enable_contain_external_table = true` 来开启。
 
 自 2.1.11 起,Doris 优化了外表的透明改写性能,主要优化了获取包含外表可用物化的性能。
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/query-acceleration/materialized-view/async-materialized-view/overview.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/query-acceleration/materialized-view/async-materialized-view/overview.md
index 636e09c571a..49f33d80166 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/query-acceleration/materialized-view/async-materialized-view/overview.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/query-acceleration/materialized-view/async-materialized-view/overview.md
@@ -46,7 +46,14 @@ Doris 异步物化视图采用了基于 SPJG(SELECT-PROJECT-JOIN-GROUP-BY)
 算法还会根据一定的策略(如成本模型)选择最优的物化视图来响应查询 SQL,从而进一步提升查询性能。
 
 
-## 物化刷新数据湖支持情况
+## 基于数据湖创建异步物化视图
+基于数据湖创建异步物化视图的语法和基于内表创建异步物化视图的语法完全一样,但有一些注意事项:
+- 
物化视图刷新需要使用到数据湖的元数据,例如分区版本信息,这些信息是从数据湖中的元数据缓存获取的,并不是直接从外部环境中获取,因此物化视图刷新完成后,数据和通过 
Doris 查询数据湖的结果保持一致,但有可能和通过其它引擎查询数据湖的结果不一致,取决于缓存的刷新情况
+- 如果 Hive 底层数据通过非 Doris 控制的外部流程(如Spark、Hive或Flink作业)发生了变更,但是却没有改变元数据,例如执行了 
`insert overwrite`,会导致物化视图认为和基表数据一致,但是查询出的数据和通过 Doris 
查询数据湖的结果不一致,可以手动强制刷新物化视图来解决此问题
+- 基于 iceberg 创建分区物化视图,仅支持 iceberg 
表分区列仅有一列的情况,并且有限度的支持分区演进功能,例如时间类型的分区时间范围发生了变化是支持的,如果分区字段发生了变化是不支持的,物化视图刷新会失败
+- 基于 hudi 
创建物化视图,无法感知到基表数据是否发生了变化,所以只要刷新过物化视图(或物化视图的部分分区),就会认为物化视图(或这些分区)和基表是同步的,所以基于 
hudi 创建物化视图仅适用于手动按需刷新的场景
+
+### 物化刷新数据湖支持情况
 
 物化刷新数据湖的支持情况,不同类型的表和 Catalog 有不同的支持程度
 
@@ -80,21 +87,21 @@ Doris 异步物化视图采用了基于 SPJG(SELECT-PROJECT-JOIN-GROUP-BY)
         <td>Iceberg</td>
         <td>Iceberg</td>
         <td>2.1 支持</td>
-        <td>不支持</td>
+        <td>3.1 支持</td>
         <td>不支持</td>
     </tr>
     <tr>
         <td>Paimon</td>
         <td>Paimon</td>
         <td>2.1 支持</td>
-        <td>不支持</td>
+        <td>3.1 支持</td>
         <td>不支持</td>
     </tr>
     <tr>
         <td>Hudi</td>
         <td>Hudi</td>
         <td>2.1 支持</td>
-        <td>不支持</td>
+        <td>3.1 支持</td>
         <td>不支持</td>
     </tr>
     <tr>
@@ -113,7 +120,7 @@ Doris 异步物化视图采用了基于 SPJG(SELECT-PROJECT-JOIN-GROUP-BY)
     </tr>
 </table>
 
-## 透明改写数据湖支持情况
+### 透明改写数据湖支持情况
 
 目前,异步物化视图的透明改写功能支持以下类型的表和 Catalog。
 
@@ -155,7 +162,7 @@ Doris 异步物化视图采用了基于 SPJG(SELECT-PROJECT-JOIN-GROUP-BY)
         <td>Hudi</td>
         <td>Hudi</td>
         <td>支持</td>
-        <td>3.1 支持</td>
+        <td>不支持</td>
     </tr>
     <tr>
         <td>JDBC</td>
@@ -171,7 +178,7 @@ Doris 异步物化视图采用了基于 SPJG(SELECT-PROJECT-JOIN-GROUP-BY)
     </tr>
 </table>
 
-物化视图使用外表,此物化视图默认是不参与透明改写的,因为因为无法感知到外表数据的变化,无法保证物化视图中的数据是最新的。
+物化视图使用外表,此物化视图默认是不参与透明改写的。
 如果想要使用外表的物化视图参与透明改写,可以通过设置 `SET 
materialized_view_rewrite_enable_contain_external_table = true` 来开启。
 
 自 2.1.11 起,Doris 优化了外表的透明改写性能,主要优化了获取包含外表可用物化的性能。
diff --git 
a/versioned_docs/version-3.x/query-acceleration/materialized-view/async-materialized-view/overview.md
 
b/versioned_docs/version-3.x/query-acceleration/materialized-view/async-materialized-view/overview.md
index 3c6014391f8..c5d93b4907e 100644
--- 
a/versioned_docs/version-3.x/query-acceleration/materialized-view/async-materialized-view/overview.md
+++ 
b/versioned_docs/version-3.x/query-acceleration/materialized-view/async-materialized-view/overview.md
@@ -42,7 +42,15 @@ Transparent rewriting is an important means for databases to 
optimize query perf
 
 Doris asynchronous materialized views utilize a transparent rewriting 
algorithm based on the SPJG (SELECT-PROJECT-JOIN-GROUP-BY) model. This 
algorithm can deeply analyze the structural information of SQL, automatically 
searching for and selecting suitable materialized views for transparent 
rewriting. When multiple materialized views are available, the algorithm will 
also choose the optimal materialized view to respond to the query SQL based on 
certain strategies (such as cost models), fu [...]
 
-## Support for Materialized Refresh Data Lake
+## Creating Asynchronous Materialized Views Based on Data Lakes
+The syntax for creating asynchronous materialized views based on data lakes is 
exactly the same as that for creating asynchronous materialized views based on 
internal tables, but there are some considerations:
+- Refreshing materialized views requires metadata from the data lake, such as 
partition version information. This information is obtained from the metadata 
cache in the data lake rather than directly from the external environment. 
Therefore, after the materialized view is refreshed, the data remains 
consistent with the results queried from the data lake through Doris. However, 
it may not match the results queried from the data lake through other engines, 
depending on the refresh status o [...]
+- If the underlying Hive data is modified by an external process not 
controlled by Doris (such as Spark, Hive, or Flink jobs) without changing the 
metadata (e.g., executing insert overwrite), the materialized view may assume 
consistency with the base table data, but the queried data may not match the 
results queried from the data lake through Doris. This issue can be resolved by 
manually forcing a refresh of the materialized view.
+- When creating partitioned materialized views based on Iceberg, only Iceberg 
tables with a single partition column are supported. Limited support is 
provided for partition evolution. For example, changes to the time range of a 
time-based partition are supported, but changes to the partition field are not. 
If the partition field is modified, the materialized view refresh will fail.
+- When creating materialized views based on Hudi, there is no awareness of 
whether the base table data has changed. Therefore, once the materialized view 
(or a partition of the materialized view) has been refreshed, it is considered 
synchronized with the base table. As a result, creating materialized views 
based on Hudi is only suitable for scenarios requiring manual on-demand refresh.
+
+
+### Support for Materialized Refresh Data Lake
 
 The support for materialized refresh data lakes varies by table type and 
catalog.
 
@@ -76,21 +84,21 @@ The support for materialized refresh data lakes varies by 
table type and catalog
         <td>Iceberg</td>
         <td>Iceberg</td>
         <td>Supported in 2.1</td>
-        <td>Not supported</td>
+        <td>Supported in 3.1</td>
         <td>Not supported</td>
     </tr>
     <tr>
         <td>Paimon</td>
         <td>Paimon</td>
         <td>Supported in 2.1</td>
-        <td>Not supported</td>
+        <td>Supported in 3.1</td>
         <td>Not supported</td>
     </tr>
     <tr>
         <td>Hudi</td>
         <td>Hudi</td>
         <td>Supported in 2.1</td>
-        <td>Not supported</td>
+        <td>Supported in 3.1</td>
         <td>Not supported</td>
     </tr>
     <tr>
@@ -109,7 +117,7 @@ The support for materialized refresh data lakes varies by 
table type and catalog
     </tr>
 </table>
 
-## Transparent Rewriting Support for Data Lake
+### Transparent Rewriting Support for Data Lake
 Currently, the transparent rewriting feature of asynchronous materialized 
views supports the following types of tables and catalogs.
 
 Real-time Base Table Data Awareness: Refers to the materialized view's ability 
to detect changes in the underlying table data it uses and utilize the latest 
data during queries.
@@ -149,7 +157,7 @@ Real-time Base Table Data Awareness: Refers to the 
materialized view's ability t
         <td>Hudi</td>
         <td>Hudi</td>
         <td>Supported</td>
-        <td>3.1 Supported</td>
+        <td>Not supported</td>
     </tr>
     <tr>
         <td>JDBC</td>
@@ -165,7 +173,7 @@ Real-time Base Table Data Awareness: Refers to the 
materialized view's ability t
     </tr>
 </table>
 
-Materialized views using external tables do not participate in transparent 
rewriting by default, because they cannot detect changes in external table data 
and cannot guarantee the data in the materialized view is up-to-date.
+Materialized views using external tables do not participate in transparent 
rewriting by default.
 If you want to enable transparent rewriting for materialized views containing 
external tables, you can set `SET 
materialized_view_rewrite_enable_contain_external_table = true`.
 
 Since version 2.1.11, Doris has optimized the transparent rewriting 
performance for external tables, mainly improving the performance of obtaining 
available materialized views containing external tables.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to