This is an automated email from the ASF dual-hosted git repository.
morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new 85e0f08df2c [doc] (doris catalog) doris catalog supports new insert
feature (#3416)
85e0f08df2c is described below
commit 85e0f08df2c83089c76df37d5220add613f7ef1e
Author: TsukiokaKogane <[email protected]>
AuthorDate: Fri Feb 27 16:30:42 2026 +0800
[doc] (doris catalog) doris catalog supports new insert feature (#3416)
## Versions
- [x] dev
- [x] 4.x
- [ ] 3.x
- [ ] 2.1
## Languages
- [x] Chinese
- [x] English
## Docs Checklist
- [ ] Checked by AI
- [ ] Test Cases Built
---------
Co-authored-by: Mingyu Chen (Rayner) <[email protected]>
---
docs/lakehouse/catalogs/doris-catalog.mdx | 156 +++++++++++++--------
.../current/lakehouse/catalogs/doris-catalog.mdx | 117 +++++++++++-----
.../lakehouse/catalogs/doris-catalog.mdx | 117 +++++++++++-----
.../lakehouse/catalogs/doris-catalog.mdx | 156 +++++++++++++--------
4 files changed, 356 insertions(+), 190 deletions(-)
diff --git a/docs/lakehouse/catalogs/doris-catalog.mdx
b/docs/lakehouse/catalogs/doris-catalog.mdx
index d5c902d0800..07f51e22456 100644
--- a/docs/lakehouse/catalogs/doris-catalog.mdx
+++ b/docs/lakehouse/catalogs/doris-catalog.mdx
@@ -2,7 +2,7 @@
{
"title": "Doris Catalog",
"language": "en",
- "description": "Doris Catalog enables cross-cluster federated analysis
across multiple Doris clusters via Arrow Flight or virtual cluster modes, with
configuration, type mapping and query optimization guides."
+ "description": "Doris Catalog supports cross-cluster federated analysis
across multiple Doris clusters. It supports both Arrow Flight and Virtual
Cluster modes, enabling efficient multi-cluster data querying, writing, and
federated analysis for distributed data warehouse scenarios."
}
---
@@ -19,9 +19,9 @@ This is an experimental feature.
Perform cross-cluster federated analysis across multiple Doris clusters.
-Unlike connecting to other Doris clusters through JDBC Catalog, this solution
enables efficient multi-Doris cluster federated analysis through Arrow Flight
or virtual cluster mode.
+Unlike connecting to other Doris clusters through JDBC Catalog, this solution
enables efficient federated analysis across multiple Doris clusters through
Arrow Flight or Virtual Cluster mode.
-## Configuring Catalog
+## Configure Catalog
### Syntax
@@ -40,106 +40,108 @@ CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES (
);
```
-* `fe_http_hosts`
+* `fe_http_hosts`
- List of remote Doris cluster FE HTTP service endpoints.
+ List of FE HTTP service endpoints for the remote Doris cluster.
-* `fe_arrow_hosts`
+* `fe_arrow_hosts`
- List of remote Doris cluster FE Arrow Flight service endpoints.
+ List of FE Arrow Flight service endpoints for the remote Doris cluster.
-* `fe_thrift_hosts`
+* `fe_thrift_hosts`
- List of remote Doris cluster FE Thrift service endpoints.
+ List of FE Thrift service endpoints for the remote Doris cluster.
- > Note: In version 4.0.2, user Master FE's address. This issue will be fixed
in next version.
+ > In version 4.0.2, please fill in the Master FE address. This issue will
be fixed in subsequent versions.
-* `use_arrow_flight`
+* `use_arrow_flight`
- Whether to access the remote Doris cluster using Arrow Flight or treat
remote tables as internal tables and send execution plans to the remote Doris
cluster for execution.
+ Whether to access the remote Doris cluster via Arrow Flight, or send the
execution plan to the remote Doris cluster as if the remote table were an
internal table.
-* `{QueryProperties}`
+* `{QueryProperties}`
- Optional properties
-
- | Parameter Name | Description | Default Value |
- |----------------|-------------|---------------|
- | `enable_parallel_result_sink` | When enabled, local Doris BE nodes will
pull data in parallel from each BE node of the remote Doris cluster. (For Arrow
Flight mode) | true |
- | `query_retry_count` | Maximum retry count for failed query requests to
remote Doris. (Does not include failures that may occur during asynchronous
execution after the request is accepted by remote Doris) | 3 |
- | `query_timeout_sec` | Timeout for sending queries to remote Doris. (Does
not include asynchronous execution time after the request is accepted by remote
Doris) | 15 |
- | `compatible` | Used to attempt compatibility with metadata formats when
accessing remote Doris with versions lower than the local cluster. No need to
enable when cluster versions are consistent. | false |
+ Optional properties.
-* `{HttpClientProperties}`
+ | Parameter Name | Description
| Default |
+
|----------------------------------|-------------------------------------------------------------------------------------------------------|---------|
+ | `enable_parallel_result_sink` | When enabled, local Doris BE nodes
will pull data in parallel from various BE nodes of the remote Doris cluster.
(For Arrow Flight mode) | true |
+ | `query_retry_count` | Maximum number of retries for failed
query requests sent to the remote Doris. (Does not include failures that may
occur during asynchronous execution after the request is accepted) | 3 |
+ | `query_timeout_sec` | Timeout for sending queries to the
remote Doris. (Does not include asynchronous execution time after the request
is accepted) | 15 |
+ | `compatible` | Used to attempt compatibility with
metadata format when accessing a remote Doris with a version lower than the
local cluster. No need to enable when cluster versions are consistent. | false
|
- HttpClientProperties section is used to configure HTTP Client related
parameters. This client is used to send HTTP requests to synchronize remote
cluster metadata. These are all optional parameters.
-
- | Parameter Name | Description | Default Value |
- |----------------|-------------|---------------|
- | `metadata_http_ssl_enabled` | Whether to enable SSL/TLS encrypted
communication for HTTP metadata synchronization. | false |
- | `metadata_sync_retry_count` | Maximum retry count for HTTP request
failures | 3 |
- | `metadata_max_idle_connections` | Maximum idle connections for HTTP
metadata sync client | 5 |
- | `metadata_keep_alive_duration_sec` | Idle connection keep-alive duration
for HTTP metadata sync client | 300 |
- | `metadata_connect_timeout_sec` | TCP connection timeout for HTTP metadata
sync client | 10 |
- | `metadata_read_timeout_sec` | Socket read timeout for HTTP metadata sync
client | 10 |
- | `metadata_write_timeout_sec` | Socket write timeout for HTTP metadata sync
client | 10 |
- | `metadata_call_timeout_sec` | HTTP request total timeout for HTTP metadata
sync client | 10 |
+* `{HttpClientProperties}`
-* `{CommonProperties}`
+ HttpClientProperties section is used to configure HTTP Client related
parameters. This client is used to send HTTP requests to synchronize metadata
from the remote cluster. These are all optional parameters.
- CommonProperties section is used to fill in common properties. Please refer
to the [Common Properties] section in the Data Catalog Overview.
+ | Parameter Name | Description
| Default |
+
|-------------------------------------|-----------------------------------------------------------|---------|
+ | `metadata_http_ssl_enabled` | Whether to enable SSL/TLS
encrypted communication for HTTP metadata synchronization. | false |
+ | `metadata_sync_retry_count` | Maximum number of retries for
failed HTTP requests. | 3 |
+ | `metadata_max_idle_connections` | Maximum number of idle connections
for HTTP metadata synchronization client. | 5 |
+ | `metadata_keep_alive_duration_sec` | Idle connection keep-alive
duration for HTTP metadata synchronization client. | 300 |
+ | `metadata_connect_timeout_sec` | TCP connection timeout for HTTP
metadata synchronization client. | 10 |
+ | `metadata_read_timeout_sec` | Socket read timeout for HTTP
metadata synchronization client. | 10 |
+ | `metadata_write_timeout_sec` | Socket write timeout for HTTP
metadata synchronization client. | 10 |
+ | `metadata_call_timeout_sec` | Total HTTP request timeout for
HTTP metadata synchronization client. | 10 |
+
+* `{CommonProperties}`
+
+ The CommonProperties section is used to fill in common properties. Please
refer to the [Common Properties] section in the Data Catalog Overview.
## Access Modes
### Arrow Flight Mode
-> Supported since 4.0.2.
+> Supported since version 4.0.2.
-When the `use_arrow_flight` property is `true`, it operates in Arrow Flight
mode.
+When the `use_arrow_flight` property is set to `true`, it is in Arrow Flight
mode.

-In this mode, during cross-cluster queries, FEs synchronize schema and other
metadata through HTTP protocol, then local cluster BE nodes access the Remote
Doris cluster through Arrow Flight interface.
+In this mode, during cross-cluster queries, FEs synchronize metadata such as
Schema through HTTP protocol, and then BE nodes of the local cluster access the
Remote Doris cluster through the Arrow Flight interface.
-**Advantages**: Minimal overhead on FE, execution plan only generates query
SQL to send to remote cluster
+**Advantages**: Almost no overhead for FE, as the execution plan only
generates query SQL to be sent to the remote cluster.
-**Disadvantages**: May not be able to utilize various optimization features of
Doris internal tables, such as aggregation pushdown, limited predicate
pushdown, etc.
+**Disadvantages**: May not be able to leverage various optimization features
of Doris internal tables, such as aggregate pushdown, limited predicate
pushdown, etc.
### Virtual Cluster Mode
-> Supported since 4.0.3.
+> Supported since version 4.0.3.
-When the `use_arrow_flight` property is `false`, it operates in virtual
cluster mode.
+When the `use_arrow_flight` property is set to `false`, it is in Virtual
Cluster mode.
-> Currently, this mode only support compute-storage coupled Doris cluster.
+> This mode currently only supports Doris clusters deployed in storage-compute
coupled mode.

In this mode, during cross-cluster queries, Backend nodes in the Remote Doris
cluster are treated as virtual nodes for query planning.
-FEs synchronize schema and other metadata through HTTP protocol. BEs directly
transfer data through internal communication protocol.
+FEs synchronize metadata such as Schema through HTTP protocol. BEs directly
transfer data through internal communication protocols.
+
+**Advantages**: Can basically leverage all optimization features of Doris
internal table queries. Query execution flow is consistent with single-cluster
internal flow.
-**Advantages**: Can basically utilize all optimization features of Doris
internal table queries. Query execution process is consistent with
single-cluster internal process.
+**Disadvantages**: For large remote tables, all information of the remote
table (partition information, replica information) will be retrieved. FE memory
overhead will increase, requiring expansion of FE memory. When cluster versions
are inconsistent, such as higher version querying lower version, query failures
may occur.
-**Disadvantages**: For large remote tables, it will obtain all information of
remote tables (partition information, replica information). FE memory overhead
will increase, requiring FE memory expansion. When cluster versions are
inconsistent, such as higher version querying lower version, query failures may
occur.
+> Since version 4.1, Virtual Cluster mode supports insert loading
functionality.
## Column Type Mapping
### Arrow Flight Mode
-The supported column types and table types in this mode depend on the
capabilities of Arrow Flight SQL. Currently, it has the following capabilities
and limitations:
+The column types and table types supported in this mode depend on the support
capabilities of Arrow Flight SQL. Currently, the following capabilities and
limitations exist:
-- Supports all primitive types
-- Supports all nested types (Array, Map, Struct)
-- Does not support hll, bitmap, and variant types
-- Supports all table models (detail tables, aggregate tables, and primary key
tables)
+- Supports all primitive types.
+- Supports all nested types (Array, Map, Struct).
+- Does not support hll, bitmap, variant types.
+- Supports all table models (Duplicate, Aggregate, and Unique tables).
### Virtual Cluster Mode
-In virtual cluster mode, all column types and all table models (detail tables,
aggregate tables, and primary key tables) are supported.
+In Virtual Cluster mode, all column types and all table models (Duplicate,
Aggregate, and Unique tables) are supported.
## Query Operations
-After configuring the Catalog, you can query table data in the Catalog through
the following methods:
+After configuring the Catalog, you can query table data in the Catalog using
the following methods:
```sql
-- 1. switch to catalog, use database and query
@@ -159,7 +161,7 @@ SELECT * FROM doris_ctl.doris_db.doris_tbl LIMIT 10;
### Arrow Flight Mode
-In this mode, Doris will try to push down predicate or function conditions and
concatenate them into the generated SQL.
+In this mode, Doris will try to push down predicates or function conditions
and concatenate them into the generated SQL.
You can view the generated SQL statement through EXPLAIN SQL.
@@ -174,9 +176,9 @@ You can view the generated SQL statement through EXPLAIN
SQL.
### Virtual Cluster Mode
-In this mode, the execution plan still shows `VOlapScanNode`.
+In this mode, what you see in the execution plan is still `VOlapScanNode`.
-Various optimizations for internal table queries in Doris can continue to be
utilized, such as Join Runtime Filter.
+Various optimizations of Doris for internal table queries can continue to be
utilized, such as Join Runtime Filter.
```sql
MySQL [(none)]> explain select * from demo.inner_table a join
edoris.external.example_tbl_duplicate b on (a.log_type = b.log_type) where
error_code=2;
@@ -244,3 +246,43 @@ MySQL [(none)]> explain select * from demo.inner_table a
join edoris.external.ex
| cardinality=1, avgRowSize=7425.0, numNodes=1
|
| pushAggOp=NONE
```
+
+## Write Operations
+
+> Supported since version 4.1.
+
+After configuring the Catalog, you can import data into tables in the Catalog
in Virtual Cluster mode using the following insert methods:
+
+```sql
+-- 1. switch to catalog, use database and insert
+SWITCH doris_ctl;
+USE doris_db;
+insert into doris_tbl values (1,2);
+
+-- 2. use doris database directly
+USE doris_ctl.doris_db;
+insert into doris_tbl values (1,2);
+
+-- 3. use full qualified name to insert
+insert into doris_ctl.doris_db.doris_tbl values (1,2);
+```
+
+### Supported Import Forms
+
+```sql
+-- 1. insert into values
+insert into doris_ctl.doris_db.doris_tbl values (1,2);
+
+-- 2. insert into select
+insert into doris_ctl.doris_db.doris_tbl select * from doris_db.doris_tbl;
+
+-- 3. insert overwrite
+insert overwrite table doris_ctl.doris_db.doris_tbl select * from
doris_db.doris_tbl;
+```
+
+### Unsupported Import Capabilities
+
+After configuring the Catalog, the following import capabilities are currently
not supported:
+
+- [Group Commit](../../data-operate/import/group-commit-manual)
+- [Transactions](../../data-operate/transaction)
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/doris-catalog.mdx
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/doris-catalog.mdx
index 7183d06df77..756e9145460 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/doris-catalog.mdx
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/doris-catalog.mdx
@@ -2,7 +2,7 @@
{
"title": "Doris Catalog",
"language": "zh-CN",
- "description": "对多个 Doris 集群进行跨集群的联邦分析。"
+ "description": "Doris Catalog 支持对多个 Doris 集群进行跨集群联邦分析。支持 Arrow Flight
和虚拟集群两种模式,实现高效的多集群数据查询、写入和联邦分析,适用于分布式数据仓库场景。"
}
---
@@ -40,54 +40,53 @@ CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES (
);
```
-* `fe_http_hosts`
+* `fe_http_hosts`
- 远端 Doris 集群 FE HTTP 服务端点列表。
+ 远端 Doris 集群 FE HTTP 服务端点列表。
-* `fe_arrow_hosts`
+* `fe_arrow_hosts`
- 远端 Doris 集群 FE Arrow Flight 服务端点列表。
+ 远端 Doris 集群 FE Arrow Flight 服务端点列表。
-* `fe_thrift_hosts`
+* `fe_thrift_hosts`
远端 Doris 集群 FE Thrift 服务端点列表。
> 在 4.0.2 版本中,请填写 Master FE 的地址。该问题将在后续版本修复。
-* `use_arrow_flight`
+* `use_arrow_flight`
- 采用 Arrow Flight 方式访问远端 Doris 集群还是将远端表当做内表执行计划发送给远端 Doris 集群执行
+ 采用 Arrow Flight 方式访问远端 Doris 集群,还是将远端表当做内表执行计划发送给远端 Doris 集群执行。
-* `{QueryProperties}`
+* `{QueryProperties}`
- 可选属性
-
- | 参数名称 | 说明
| 默认值 |
-
|-----------------------------|------------------------------------------------------------------------------------------|-------|
- | `enable_parallel_result_sink` | 开启后,本地 Doris BE 节点将并行地从远端 Doris 集群各 BE
节点拉取数据。(针对 Arrow Flight 方式) | true |
- | `query_retry_count` | 向远端 Doris 发送查询请求失败的最大重试次数。(不包含请求被接受后,远端
Doris 异步执行过程中可能发生的失败) | 3 |
- | `query_timeout_sec` | 向远端 Doris 发送查询的超时时间。(不包含请求被接受后,远端 Doris
异步执行时间) | 15 |
- | `compatible` | 用于在访问版本低于本集群的远端 Doris
时,尝试兼容其元数据格式。集群版本一致时无需开启。 | false |
+ 可选属性。
+ | 参数名称 | 说明
| 默认值 |
+
|----------------------------------|-----------------------------------------------------------------------------------------------|--------|
+ | `enable_parallel_result_sink` | 开启后,本地 Doris BE 节点将并行地从远端 Doris 集群各
BE 节点拉取数据。(针对 Arrow Flight 方式) | true |
+ | `query_retry_count` | 向远端 Doris
发送查询请求失败的最大重试次数。(不包含请求被接受后,远端 Doris 异步执行过程中可能发生的失败)| 3 |
+ | `query_timeout_sec` | 向远端 Doris 发送查询的超时时间。(不包含请求被接受后,远端
Doris 异步执行时间) | 15 |
+ | `compatible` | 用于在访问版本低于本集群的远端 Doris
时,尝试兼容其元数据格式。集群版本一致时无需开启。 | false |
-* `{HttpClientProperties}`
+* `{HttpClientProperties}`
- HttpClientProperties 部分用于配置 HTTP Client 相关参数,该 Client 用于发送 HTTP
请求同步远端集群元数据。这些都是可选参数。
-
- | 参数名称 | 说明
| 默认值 |
-
|----------------------------------|--------------------------------------------|-------|
- | `metadata_http_ssl_enabled` | HTTP 元数据同步,是否启用 SSL/TLS 加密通信。 |
false |
- | `metadata_sync_retry_count` | HTTP HTTP 请求失败最大重试次数 | 3 |
- | `metadata_max_idle_connections` | HTTP 元数据同步,客户端最大空闲连接数 | 5
|
- | `metadata_keep_alive_duration_sec` | HTTP 元数据同步,客户端空闲连接存活时长 | 300
|
- | `metadata_connect_timeout_sec` | HTTP 元数据同步,客户端 TCP 连接超时时间 |
10 |
- | `metadata_read_timeout_sec` | HTTP 元数据同步,客户端 socket read timeout
| 10 |
- | `metadata_write_timeout_sec` | HTTP 元数据同步,客户端 socket write timeout
| 10 |
- | `metadata_call_timeout_sec` | HTTP 元数据同步,客户端 HTTP 请求总超时时间 |
10 |
+ HttpClientProperties 部分用于配置 HTTP Client 相关参数,该 Client 用于发送 HTTP
请求同步远端集群元数据。这些都是可选参数。
-* `{CommonProperties}`
+ | 参数名称 | 说明
| 默认值 |
+
|-------------------------------------|-----------------------------------------------------|--------|
+ | `metadata_http_ssl_enabled` | HTTP 元数据同步,是否启用 SSL/TLS 加密通信。
| false |
+ | `metadata_sync_retry_count` | HTTP 请求失败最大重试次数。
| 3 |
+ | `metadata_max_idle_connections` | HTTP 元数据同步,客户端最大空闲连接数。
| 5 |
+ | `metadata_keep_alive_duration_sec` | HTTP 元数据同步,客户端空闲连接存活时长。
| 300 |
+ | `metadata_connect_timeout_sec` | HTTP 元数据同步,客户端 TCP 连接超时时间。
| 10 |
+ | `metadata_read_timeout_sec` | HTTP 元数据同步,客户端 socket read
timeout。 | 10 |
+ | `metadata_write_timeout_sec` | HTTP 元数据同步,客户端 socket write
timeout。 | 10 |
+ | `metadata_call_timeout_sec` | HTTP 元数据同步,客户端 HTTP 请求总超时时间。
| 10 |
- CommonProperties 部分用于填写通用属性。请参阅 数据目录概述 中【通用属性】部分。
+* `{CommonProperties}`
+
+ CommonProperties 部分用于填写通用属性。请参阅数据目录概述中【通用属性】部分。
## 访问模式
@@ -101,7 +100,7 @@ CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES (
在该模式下进行跨集群查询时,FE 之间通过 HTTP 协议同步 Schema 等元信息,然后本地集群的 BE 节点,通过 Arrow Flight 接口访问
Remote Doris 集群。
-**优点**:对于 FE 基本没开销,执行计划仅生成查询 SQL 发往远端集群
+**优点**:对于 FE 基本无开销,执行计划仅生成查询 SQL 发往远端集群。
**缺点**:可能无法利用 Doris 内表的各种优化特性,如聚合下推、有限的谓词下推等。
@@ -121,7 +120,9 @@ FE 之间通过 HTTP 协议同步 Schema 等元信息。BE 直接通过内部通
**优点**:基本可以利用 Doris 内表查询的所有优化特性。查询执行流程和单集群内部流程一致。
-**缺点**:对于较大的远端表来说,会获取远端表的所有信息 (分区信息,副本信息)。FE 的内存开销会上升,需要扩大 FE
内存。在各集群版本不一致时,比如高版本查询低版本,可能会出现查询失败。
+**缺点**:对于较大的远端表来说,会获取远端表的所有信息(分区信息、副本信息)。FE 的内存开销会上升,需要扩大 FE
内存。在各集群版本不一致时,比如高版本查询低版本,可能会出现查询失败。
+
+> 自 4.1 版本,虚拟集群模式支持 insert 导入功能。
## 列类型映射
@@ -129,10 +130,10 @@ FE 之间通过 HTTP 协议同步 Schema 等元信息。BE 直接通过内部通
该模式下支持的列类型和表类型,取决于 Arrow Flight SQL 的支持能力,目前有以下能力和限制:
-- 支持所有基础类型(Primitive Type)
-- 支持所有嵌套类型(Array、Map、Struct)
-- 不支持 hll、bitmap、variant 类型
-- 支持所有的表模式(明细表、聚合表和主键表)
+- 支持所有基础类型(Primitive Type)。
+- 支持所有嵌套类型(Array、Map、Struct)。
+- 不支持 hll、bitmap、variant 类型。
+- 支持所有的表模式(明细表、聚合表和主键表)。
### 虚拟集群模式
@@ -245,3 +246,43 @@ MySQL [(none)]> explain select * from demo.inner_table a
join edoris.external.ex
| cardinality=1, avgRowSize=7425.0, numNodes=1
|
| pushAggOp=NONE
```
+
+## 写入操作
+
+> 自 4.1 版本支持。
+
+配置好 Catalog 后,可以通过以下的 insert 方式向虚拟集群模式下的 Catalog 中的表中导入数据:
+
+```sql
+-- 1. switch to catalog, use database and query
+SWITCH doris_ctl;
+USE doris_db;
+insert into doris_tbl values (1,2);
+
+-- 2. use doris database directly
+USE doris_ctl.doris_db;
+insert into doris_tbl values (1,2);
+
+-- 3. use full qualified name to query
+insert into doris_ctl.doris_db.doris_tbl values (1,2);
+```
+
+### 可以支持的导入形式
+
+```sql
+-- 1.insert into values
+insert into doris_ctl.doris_db.doris_tbl values (1,2);
+
+-- 2.insert into select
+insert into doris_ctl.doris_db.doris_tbl select * from doris_db.doris_tbl;
+
+-- 3. insert overwrite
+insert overwrite table doris_ctl.doris_db.doris_tbl select * from
doris_db.doris_tbl;
+```
+
+### 不支持的导入能力
+
+配置好 Catalog 后,以下的导入能力目前不被支持:
+
+- [Group Commit](../../data-operate/import/group-commit-manual)
+- [事务](../../data-operate/transaction)
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/catalogs/doris-catalog.mdx
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/catalogs/doris-catalog.mdx
index 7183d06df77..756e9145460 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/catalogs/doris-catalog.mdx
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/catalogs/doris-catalog.mdx
@@ -2,7 +2,7 @@
{
"title": "Doris Catalog",
"language": "zh-CN",
- "description": "对多个 Doris 集群进行跨集群的联邦分析。"
+ "description": "Doris Catalog 支持对多个 Doris 集群进行跨集群联邦分析。支持 Arrow Flight
和虚拟集群两种模式,实现高效的多集群数据查询、写入和联邦分析,适用于分布式数据仓库场景。"
}
---
@@ -40,54 +40,53 @@ CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES (
);
```
-* `fe_http_hosts`
+* `fe_http_hosts`
- 远端 Doris 集群 FE HTTP 服务端点列表。
+ 远端 Doris 集群 FE HTTP 服务端点列表。
-* `fe_arrow_hosts`
+* `fe_arrow_hosts`
- 远端 Doris 集群 FE Arrow Flight 服务端点列表。
+ 远端 Doris 集群 FE Arrow Flight 服务端点列表。
-* `fe_thrift_hosts`
+* `fe_thrift_hosts`
远端 Doris 集群 FE Thrift 服务端点列表。
> 在 4.0.2 版本中,请填写 Master FE 的地址。该问题将在后续版本修复。
-* `use_arrow_flight`
+* `use_arrow_flight`
- 采用 Arrow Flight 方式访问远端 Doris 集群还是将远端表当做内表执行计划发送给远端 Doris 集群执行
+ 采用 Arrow Flight 方式访问远端 Doris 集群,还是将远端表当做内表执行计划发送给远端 Doris 集群执行。
-* `{QueryProperties}`
+* `{QueryProperties}`
- 可选属性
-
- | 参数名称 | 说明
| 默认值 |
-
|-----------------------------|------------------------------------------------------------------------------------------|-------|
- | `enable_parallel_result_sink` | 开启后,本地 Doris BE 节点将并行地从远端 Doris 集群各 BE
节点拉取数据。(针对 Arrow Flight 方式) | true |
- | `query_retry_count` | 向远端 Doris 发送查询请求失败的最大重试次数。(不包含请求被接受后,远端
Doris 异步执行过程中可能发生的失败) | 3 |
- | `query_timeout_sec` | 向远端 Doris 发送查询的超时时间。(不包含请求被接受后,远端 Doris
异步执行时间) | 15 |
- | `compatible` | 用于在访问版本低于本集群的远端 Doris
时,尝试兼容其元数据格式。集群版本一致时无需开启。 | false |
+ 可选属性。
+ | 参数名称 | 说明
| 默认值 |
+
|----------------------------------|-----------------------------------------------------------------------------------------------|--------|
+ | `enable_parallel_result_sink` | 开启后,本地 Doris BE 节点将并行地从远端 Doris 集群各
BE 节点拉取数据。(针对 Arrow Flight 方式) | true |
+ | `query_retry_count` | 向远端 Doris
发送查询请求失败的最大重试次数。(不包含请求被接受后,远端 Doris 异步执行过程中可能发生的失败)| 3 |
+ | `query_timeout_sec` | 向远端 Doris 发送查询的超时时间。(不包含请求被接受后,远端
Doris 异步执行时间) | 15 |
+ | `compatible` | 用于在访问版本低于本集群的远端 Doris
时,尝试兼容其元数据格式。集群版本一致时无需开启。 | false |
-* `{HttpClientProperties}`
+* `{HttpClientProperties}`
- HttpClientProperties 部分用于配置 HTTP Client 相关参数,该 Client 用于发送 HTTP
请求同步远端集群元数据。这些都是可选参数。
-
- | 参数名称 | 说明
| 默认值 |
-
|----------------------------------|--------------------------------------------|-------|
- | `metadata_http_ssl_enabled` | HTTP 元数据同步,是否启用 SSL/TLS 加密通信。 |
false |
- | `metadata_sync_retry_count` | HTTP HTTP 请求失败最大重试次数 | 3 |
- | `metadata_max_idle_connections` | HTTP 元数据同步,客户端最大空闲连接数 | 5
|
- | `metadata_keep_alive_duration_sec` | HTTP 元数据同步,客户端空闲连接存活时长 | 300
|
- | `metadata_connect_timeout_sec` | HTTP 元数据同步,客户端 TCP 连接超时时间 |
10 |
- | `metadata_read_timeout_sec` | HTTP 元数据同步,客户端 socket read timeout
| 10 |
- | `metadata_write_timeout_sec` | HTTP 元数据同步,客户端 socket write timeout
| 10 |
- | `metadata_call_timeout_sec` | HTTP 元数据同步,客户端 HTTP 请求总超时时间 |
10 |
+ HttpClientProperties 部分用于配置 HTTP Client 相关参数,该 Client 用于发送 HTTP
请求同步远端集群元数据。这些都是可选参数。
-* `{CommonProperties}`
+ | 参数名称 | 说明
| 默认值 |
+
|-------------------------------------|-----------------------------------------------------|--------|
+ | `metadata_http_ssl_enabled` | HTTP 元数据同步,是否启用 SSL/TLS 加密通信。
| false |
+ | `metadata_sync_retry_count` | HTTP 请求失败最大重试次数。
| 3 |
+ | `metadata_max_idle_connections` | HTTP 元数据同步,客户端最大空闲连接数。
| 5 |
+ | `metadata_keep_alive_duration_sec` | HTTP 元数据同步,客户端空闲连接存活时长。
| 300 |
+ | `metadata_connect_timeout_sec` | HTTP 元数据同步,客户端 TCP 连接超时时间。
| 10 |
+ | `metadata_read_timeout_sec` | HTTP 元数据同步,客户端 socket read
timeout。 | 10 |
+ | `metadata_write_timeout_sec` | HTTP 元数据同步,客户端 socket write
timeout。 | 10 |
+ | `metadata_call_timeout_sec` | HTTP 元数据同步,客户端 HTTP 请求总超时时间。
| 10 |
- CommonProperties 部分用于填写通用属性。请参阅 数据目录概述 中【通用属性】部分。
+* `{CommonProperties}`
+
+ CommonProperties 部分用于填写通用属性。请参阅数据目录概述中【通用属性】部分。
## 访问模式
@@ -101,7 +100,7 @@ CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES (
在该模式下进行跨集群查询时,FE 之间通过 HTTP 协议同步 Schema 等元信息,然后本地集群的 BE 节点,通过 Arrow Flight 接口访问
Remote Doris 集群。
-**优点**:对于 FE 基本没开销,执行计划仅生成查询 SQL 发往远端集群
+**优点**:对于 FE 基本无开销,执行计划仅生成查询 SQL 发往远端集群。
**缺点**:可能无法利用 Doris 内表的各种优化特性,如聚合下推、有限的谓词下推等。
@@ -121,7 +120,9 @@ FE 之间通过 HTTP 协议同步 Schema 等元信息。BE 直接通过内部通
**优点**:基本可以利用 Doris 内表查询的所有优化特性。查询执行流程和单集群内部流程一致。
-**缺点**:对于较大的远端表来说,会获取远端表的所有信息 (分区信息,副本信息)。FE 的内存开销会上升,需要扩大 FE
内存。在各集群版本不一致时,比如高版本查询低版本,可能会出现查询失败。
+**缺点**:对于较大的远端表来说,会获取远端表的所有信息(分区信息、副本信息)。FE 的内存开销会上升,需要扩大 FE
内存。在各集群版本不一致时,比如高版本查询低版本,可能会出现查询失败。
+
+> 自 4.1 版本,虚拟集群模式支持 insert 导入功能。
## 列类型映射
@@ -129,10 +130,10 @@ FE 之间通过 HTTP 协议同步 Schema 等元信息。BE 直接通过内部通
该模式下支持的列类型和表类型,取决于 Arrow Flight SQL 的支持能力,目前有以下能力和限制:
-- 支持所有基础类型(Primitive Type)
-- 支持所有嵌套类型(Array、Map、Struct)
-- 不支持 hll、bitmap、variant 类型
-- 支持所有的表模式(明细表、聚合表和主键表)
+- 支持所有基础类型(Primitive Type)。
+- 支持所有嵌套类型(Array、Map、Struct)。
+- 不支持 hll、bitmap、variant 类型。
+- 支持所有的表模式(明细表、聚合表和主键表)。
### 虚拟集群模式
@@ -245,3 +246,43 @@ MySQL [(none)]> explain select * from demo.inner_table a
join edoris.external.ex
| cardinality=1, avgRowSize=7425.0, numNodes=1
|
| pushAggOp=NONE
```
+
+## 写入操作
+
+> 自 4.1 版本支持。
+
+配置好 Catalog 后,可以通过以下的 insert 方式向虚拟集群模式下的 Catalog 中的表中导入数据:
+
+```sql
+-- 1. switch to catalog, use database and query
+SWITCH doris_ctl;
+USE doris_db;
+insert into doris_tbl values (1,2);
+
+-- 2. use doris database directly
+USE doris_ctl.doris_db;
+insert into doris_tbl values (1,2);
+
+-- 3. use full qualified name to query
+insert into doris_ctl.doris_db.doris_tbl values (1,2);
+```
+
+### 可以支持的导入形式
+
+```sql
+-- 1.insert into values
+insert into doris_ctl.doris_db.doris_tbl values (1,2);
+
+-- 2.insert into select
+insert into doris_ctl.doris_db.doris_tbl select * from doris_db.doris_tbl;
+
+-- 3. insert overwrite
+insert overwrite table doris_ctl.doris_db.doris_tbl select * from
doris_db.doris_tbl;
+```
+
+### 不支持的导入能力
+
+配置好 Catalog 后,以下的导入能力目前不被支持:
+
+- [Group Commit](../../data-operate/import/group-commit-manual)
+- [事务](../../data-operate/transaction)
diff --git a/versioned_docs/version-4.x/lakehouse/catalogs/doris-catalog.mdx
b/versioned_docs/version-4.x/lakehouse/catalogs/doris-catalog.mdx
index d5c902d0800..07f51e22456 100644
--- a/versioned_docs/version-4.x/lakehouse/catalogs/doris-catalog.mdx
+++ b/versioned_docs/version-4.x/lakehouse/catalogs/doris-catalog.mdx
@@ -2,7 +2,7 @@
{
"title": "Doris Catalog",
"language": "en",
- "description": "Doris Catalog enables cross-cluster federated analysis
across multiple Doris clusters via Arrow Flight or virtual cluster modes, with
configuration, type mapping and query optimization guides."
+ "description": "Doris Catalog supports cross-cluster federated analysis
across multiple Doris clusters. It supports both Arrow Flight and Virtual
Cluster modes, enabling efficient multi-cluster data querying, writing, and
federated analysis for distributed data warehouse scenarios."
}
---
@@ -19,9 +19,9 @@ This is an experimental feature.
Perform cross-cluster federated analysis across multiple Doris clusters.
-Unlike connecting to other Doris clusters through JDBC Catalog, this solution
enables efficient multi-Doris cluster federated analysis through Arrow Flight
or virtual cluster mode.
+Unlike connecting to other Doris clusters through JDBC Catalog, this solution
enables efficient federated analysis across multiple Doris clusters through
Arrow Flight or Virtual Cluster mode.
-## Configuring Catalog
+## Configure Catalog
### Syntax
@@ -40,106 +40,108 @@ CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES (
);
```
-* `fe_http_hosts`
+* `fe_http_hosts`
- List of remote Doris cluster FE HTTP service endpoints.
+ List of FE HTTP service endpoints for the remote Doris cluster.
-* `fe_arrow_hosts`
+* `fe_arrow_hosts`
- List of remote Doris cluster FE Arrow Flight service endpoints.
+ List of FE Arrow Flight service endpoints for the remote Doris cluster.
-* `fe_thrift_hosts`
+* `fe_thrift_hosts`
- List of remote Doris cluster FE Thrift service endpoints.
+ List of FE Thrift service endpoints for the remote Doris cluster.
- > Note: In version 4.0.2, user Master FE's address. This issue will be fixed
in next version.
+ > In version 4.0.2, please fill in the Master FE address. This issue will
be fixed in subsequent versions.
-* `use_arrow_flight`
+* `use_arrow_flight`
- Whether to access the remote Doris cluster using Arrow Flight or treat
remote tables as internal tables and send execution plans to the remote Doris
cluster for execution.
+ Whether to access the remote Doris cluster via Arrow Flight, or send the
execution plan to the remote Doris cluster as if the remote table were an
internal table.
-* `{QueryProperties}`
+* `{QueryProperties}`
- Optional properties
-
- | Parameter Name | Description | Default Value |
- |----------------|-------------|---------------|
- | `enable_parallel_result_sink` | When enabled, local Doris BE nodes will
pull data in parallel from each BE node of the remote Doris cluster. (For Arrow
Flight mode) | true |
- | `query_retry_count` | Maximum retry count for failed query requests to
remote Doris. (Does not include failures that may occur during asynchronous
execution after the request is accepted by remote Doris) | 3 |
- | `query_timeout_sec` | Timeout for sending queries to remote Doris. (Does
not include asynchronous execution time after the request is accepted by remote
Doris) | 15 |
- | `compatible` | Used to attempt compatibility with metadata formats when
accessing remote Doris with versions lower than the local cluster. No need to
enable when cluster versions are consistent. | false |
+ Optional properties.
-* `{HttpClientProperties}`
+ | Parameter Name | Description
| Default |
+
|----------------------------------|-------------------------------------------------------------------------------------------------------|---------|
+ | `enable_parallel_result_sink` | When enabled, local Doris BE nodes
will pull data in parallel from various BE nodes of the remote Doris cluster.
(For Arrow Flight mode) | true |
+ | `query_retry_count` | Maximum number of retries for failed
query requests sent to the remote Doris. (Does not include failures that may
occur during asynchronous execution after the request is accepted) | 3 |
+ | `query_timeout_sec` | Timeout for sending queries to the
remote Doris. (Does not include asynchronous execution time after the request
is accepted) | 15 |
+ | `compatible` | Used to attempt compatibility with
metadata format when accessing a remote Doris with a version lower than the
local cluster. No need to enable when cluster versions are consistent. | false
|
- HttpClientProperties section is used to configure HTTP Client related
parameters. This client is used to send HTTP requests to synchronize remote
cluster metadata. These are all optional parameters.
-
- | Parameter Name | Description | Default Value |
- |----------------|-------------|---------------|
- | `metadata_http_ssl_enabled` | Whether to enable SSL/TLS encrypted
communication for HTTP metadata synchronization. | false |
- | `metadata_sync_retry_count` | Maximum retry count for HTTP request
failures | 3 |
- | `metadata_max_idle_connections` | Maximum idle connections for HTTP
metadata sync client | 5 |
- | `metadata_keep_alive_duration_sec` | Idle connection keep-alive duration
for HTTP metadata sync client | 300 |
- | `metadata_connect_timeout_sec` | TCP connection timeout for HTTP metadata
sync client | 10 |
- | `metadata_read_timeout_sec` | Socket read timeout for HTTP metadata sync
client | 10 |
- | `metadata_write_timeout_sec` | Socket write timeout for HTTP metadata sync
client | 10 |
- | `metadata_call_timeout_sec` | HTTP request total timeout for HTTP metadata
sync client | 10 |
+* `{HttpClientProperties}`
-* `{CommonProperties}`
+ HttpClientProperties section is used to configure HTTP Client related
parameters. This client is used to send HTTP requests to synchronize metadata
from the remote cluster. These are all optional parameters.
- CommonProperties section is used to fill in common properties. Please refer
to the [Common Properties] section in the Data Catalog Overview.
+ | Parameter Name | Description
| Default |
+
|-------------------------------------|-----------------------------------------------------------|---------|
+ | `metadata_http_ssl_enabled` | Whether to enable SSL/TLS
encrypted communication for HTTP metadata synchronization. | false |
+ | `metadata_sync_retry_count` | Maximum number of retries for
failed HTTP requests. | 3 |
+ | `metadata_max_idle_connections` | Maximum number of idle connections
for HTTP metadata synchronization client. | 5 |
+ | `metadata_keep_alive_duration_sec` | Idle connection keep-alive
duration for HTTP metadata synchronization client. | 300 |
+ | `metadata_connect_timeout_sec` | TCP connection timeout for HTTP
metadata synchronization client. | 10 |
+ | `metadata_read_timeout_sec` | Socket read timeout for HTTP
metadata synchronization client. | 10 |
+ | `metadata_write_timeout_sec` | Socket write timeout for HTTP
metadata synchronization client. | 10 |
+ | `metadata_call_timeout_sec` | Total HTTP request timeout for
HTTP metadata synchronization client. | 10 |
+
+* `{CommonProperties}`
+
+ The CommonProperties section is used to fill in common properties. Please
refer to the [Common Properties] section in the Data Catalog Overview.
## Access Modes
### Arrow Flight Mode
-> Supported since 4.0.2.
+> Supported since version 4.0.2.
-When the `use_arrow_flight` property is `true`, it operates in Arrow Flight
mode.
+When the `use_arrow_flight` property is set to `true`, it is in Arrow Flight
mode.

-In this mode, during cross-cluster queries, FEs synchronize schema and other
metadata through HTTP protocol, then local cluster BE nodes access the Remote
Doris cluster through Arrow Flight interface.
+In this mode, during cross-cluster queries, FEs synchronize metadata such as
Schema through HTTP protocol, and then BE nodes of the local cluster access the
Remote Doris cluster through the Arrow Flight interface.
-**Advantages**: Minimal overhead on FE, execution plan only generates query
SQL to send to remote cluster
+**Advantages**: Almost no overhead for FE, as the execution plan only
generates query SQL to be sent to the remote cluster.
-**Disadvantages**: May not be able to utilize various optimization features of
Doris internal tables, such as aggregation pushdown, limited predicate
pushdown, etc.
+**Disadvantages**: May not be able to leverage various optimization features
of Doris internal tables, such as aggregate pushdown, limited predicate
pushdown, etc.
### Virtual Cluster Mode
-> Supported since 4.0.3.
+> Supported since version 4.0.3.
-When the `use_arrow_flight` property is `false`, it operates in virtual
cluster mode.
+When the `use_arrow_flight` property is set to `false`, it is in Virtual
Cluster mode.
-> Currently, this mode only support compute-storage coupled Doris cluster.
+> This mode currently only supports Doris clusters deployed in storage-compute
coupled mode.

In this mode, during cross-cluster queries, Backend nodes in the Remote Doris
cluster are treated as virtual nodes for query planning.
-FEs synchronize schema and other metadata through HTTP protocol. BEs directly
transfer data through internal communication protocol.
+FEs synchronize metadata such as Schema through HTTP protocol. BEs directly
transfer data through internal communication protocols.
+
+**Advantages**: Can basically leverage all optimization features of Doris
internal table queries. Query execution flow is consistent with single-cluster
internal flow.
-**Advantages**: Can basically utilize all optimization features of Doris
internal table queries. Query execution process is consistent with
single-cluster internal process.
+**Disadvantages**: For large remote tables, all information of the remote
table (partition information, replica information) will be retrieved. FE memory
overhead will increase, requiring expansion of FE memory. When cluster versions
are inconsistent, such as higher version querying lower version, query failures
may occur.
-**Disadvantages**: For large remote tables, it will obtain all information of
remote tables (partition information, replica information). FE memory overhead
will increase, requiring FE memory expansion. When cluster versions are
inconsistent, such as higher version querying lower version, query failures may
occur.
+> Since version 4.1, Virtual Cluster mode supports insert loading
functionality.
## Column Type Mapping
### Arrow Flight Mode
-The supported column types and table types in this mode depend on the
capabilities of Arrow Flight SQL. Currently, it has the following capabilities
and limitations:
+The column types and table types supported in this mode depend on the support
capabilities of Arrow Flight SQL. Currently, the following capabilities and
limitations exist:
-- Supports all primitive types
-- Supports all nested types (Array, Map, Struct)
-- Does not support hll, bitmap, and variant types
-- Supports all table models (detail tables, aggregate tables, and primary key
tables)
+- Supports all primitive types.
+- Supports all nested types (Array, Map, Struct).
+- Does not support hll, bitmap, variant types.
+- Supports all table models (Duplicate, Aggregate, and Unique tables).
### Virtual Cluster Mode
-In virtual cluster mode, all column types and all table models (detail tables,
aggregate tables, and primary key tables) are supported.
+In Virtual Cluster mode, all column types and all table models (Duplicate,
Aggregate, and Unique tables) are supported.
## Query Operations
-After configuring the Catalog, you can query table data in the Catalog through
the following methods:
+After configuring the Catalog, you can query table data in the Catalog using
the following methods:
```sql
-- 1. switch to catalog, use database and query
@@ -159,7 +161,7 @@ SELECT * FROM doris_ctl.doris_db.doris_tbl LIMIT 10;
### Arrow Flight Mode
-In this mode, Doris will try to push down predicate or function conditions and
concatenate them into the generated SQL.
+In this mode, Doris will try to push down predicates or function conditions
and concatenate them into the generated SQL.
You can view the generated SQL statement through EXPLAIN SQL.
@@ -174,9 +176,9 @@ You can view the generated SQL statement through EXPLAIN
SQL.
### Virtual Cluster Mode
-In this mode, the execution plan still shows `VOlapScanNode`.
+In this mode, what you see in the execution plan is still `VOlapScanNode`.
-Various optimizations for internal table queries in Doris can continue to be
utilized, such as Join Runtime Filter.
+Various optimizations of Doris for internal table queries can continue to be
utilized, such as Join Runtime Filter.
```sql
MySQL [(none)]> explain select * from demo.inner_table a join
edoris.external.example_tbl_duplicate b on (a.log_type = b.log_type) where
error_code=2;
@@ -244,3 +246,43 @@ MySQL [(none)]> explain select * from demo.inner_table a
join edoris.external.ex
| cardinality=1, avgRowSize=7425.0, numNodes=1
|
| pushAggOp=NONE
```
+
+## Write Operations
+
+> Supported since version 4.1.
+
+After configuring the Catalog, you can import data into tables in the Catalog
in Virtual Cluster mode using the following insert methods:
+
+```sql
+-- 1. switch to catalog, use database and insert
+SWITCH doris_ctl;
+USE doris_db;
+insert into doris_tbl values (1,2);
+
+-- 2. use doris database directly
+USE doris_ctl.doris_db;
+insert into doris_tbl values (1,2);
+
+-- 3. use full qualified name to insert
+insert into doris_ctl.doris_db.doris_tbl values (1,2);
+```
+
+### Supported Import Forms
+
+```sql
+-- 1. insert into values
+insert into doris_ctl.doris_db.doris_tbl values (1,2);
+
+-- 2. insert into select
+insert into doris_ctl.doris_db.doris_tbl select * from doris_db.doris_tbl;
+
+-- 3. insert overwrite
+insert overwrite table doris_ctl.doris_db.doris_tbl select * from
doris_db.doris_tbl;
+```
+
+### Unsupported Import Capabilities
+
+After configuring the Catalog, the following import capabilities are currently
not supported:
+
+- [Group Commit](../../data-operate/import/group-commit-manual)
+- [Transactions](../../data-operate/transaction)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]