This is an automated email from the ASF dual-hosted git repository.
diwu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new 36b4fe0630b [doc](streamingjob) fix pg mysql doc (#3502)
36b4fe0630b is described below
commit 36b4fe0630be7ae53db2d27bb90a3b8ce7f5f6a6
Author: wudi <[email protected]>
AuthorDate: Mon Mar 30 15:34:21 2026 +0800
[doc](streamingjob) fix pg mysql doc (#3502)
## Versions
- [x] dev
- [x] 4.x
- [ ] 3.x
- [ ] 2.1
## Languages
- [ ] Chinese
- [ ] English
## Docs Checklist
- [ ] Checked by AI
- [ ] Test Cases Built
---
.../streaming-job/streaming-job-multi-table.md | 93 ++++++++++++---------
.../streaming-job/streaming-job-multi-table.md | 97 ++++++++++++----------
.../streaming-job/streaming-job-multi-table.md | 95 +++++++++++----------
.../streaming-job/streaming-job-multi-table.md | 93 ++++++++++++---------
4 files changed, 212 insertions(+), 166 deletions(-)
diff --git
a/docs/data-operate/import/streaming-job/streaming-job-multi-table.md
b/docs/data-operate/import/streaming-job/streaming-job-multi-table.md
index 5ac0d6ea910..f706443e3cc 100644
--- a/docs/data-operate/import/streaming-job/streaming-job-multi-table.md
+++ b/docs/data-operate/import/streaming-job/streaming-job-multi-table.md
@@ -1,35 +1,28 @@
---
{
- "title": "Postgres/MySQL Continuous Load",
+ "title": "MySQL/PostgreSQL Continuous Load",
"language": "en",
- "description": "Doris can continuously synchronize full and incremental data
from multiple tables in MySQL, Postgres, etc. to Doris using Streaming Job."
+ "description": "Doris can continuously synchronize full and incremental data
from multiple tables in MySQL, PostgreSQL, etc. to Doris using Streaming Job."
}
---
## Overview
-Supports using Job to continuously synchronize full and incremental data from
multiple tables in MySQL, Postgres, etc. to Doris via Streaming Job. Suitable
for scenarios requiring real-time multi-table data synchronization to Doris.
+Supports using Job to continuously synchronize full and incremental data from
multiple tables in MySQL, PostgreSQL, etc. to Doris via Streaming Job. Suitable
for scenarios requiring real-time multi-table data synchronization to Doris.
-## Supported Data Sources
-
-- MySQL
-- Postgres
-
-## Basic Principles
-
-By integrating [Flink CDC](https://github.com/apache/flink-cdc), Doris
supports reading change logs from MySQL, Postgres, etc., enabling full and
incremental multi-table data synchronization. When synchronizing for the first
time, Doris automatically creates downstream tables (primary key tables) and
keeps the primary key consistent with the upstream.
+By integrating [Flink CDC](https://github.com/apache/flink-cdc), Doris
supports reading change logs from MySQL, PostgreSQL, etc., enabling full and
incremental multi-table data synchronization. When synchronizing for the first
time, Doris automatically creates downstream tables (primary key tables) and
keeps the primary key consistent with the upstream.
**Notes:**
1. Currently only at-least-once semantics are guaranteed.
2. Only primary key tables are supported for synchronization.
3. LOAD privilege is required. If the downstream table does not exist, CREATE
privilege is also required.
+4. During automatic table creation, if the target table already exists, it
will be skipped, and users can customize tables according to different
scenarios.
-## Quick Start
+## MySQL Continuous Load
### Prerequisites
-#### MySQL
Enable Binlog on MySQL by adding the following to my.cnf:
```ini
log-bin=mysql-bin
@@ -37,16 +30,8 @@ binlog_format=ROW
server-id=1
```
-#### Postgres
-Enable logical replication on Postgres by adding the following to
postgresql.conf:
-```ini
-wal_level=logical
-```
-
### Creating an Import Job
-#### MySQL
-
```sql
CREATE JOB multi_table_sync
ON STREAMING
@@ -61,11 +46,35 @@ FROM MYSQL (
"offset" = "initial"
)
TO DATABASE target_test_db (
- "table.create.properties.replication_num" = "1"
+ "table.create.properties.replication_num" = "1" -- Set to 1 for single BE
deployment
)
```
-#### Postgres
+### MySQL Source Parameters
+
+| Parameter | Default | Description |
+| ------------- | ------- | ------------------------------------------- |
+| jdbc_url | - | MySQL JDBC connection string |
+| driver_url | - | JDBC driver jar path |
+| driver_class | - | JDBC driver class name |
+| user | - | Database username |
+| password | - | Database password |
+| database | - | Database name |
+| include_tables| - | Tables to synchronize, comma separated |
+| offset | initial | initial: full + incremental, latest: incremental
only |
+| snapshot_split_size | 8096 | The size (in number of rows) of each split.
During full synchronization, a table will be divided into multiple splits for
synchronization. |
+| snapshot_parallelism | 1 | The parallelism level during the full
synchronization phase, i.e., the maximum number of splits a single task can
schedule at once. |
+
+## PostgreSQL Continuous Load
+
+### Prerequisites
+
+Enable logical replication on PostgreSQL by adding the following to
postgresql.conf:
+```ini
+wal_level=logical
+```
+
+### Creating an Import Job
```sql
CREATE JOB test_postgres_job
@@ -82,10 +91,28 @@ FROM POSTGRES (
"offset" = "latest"
)
TO DATABASE target_test_db (
- "table.create.properties.replication_num" = "1"
+ "table.create.properties.replication_num" = "1" -- Set to 1 for single BE
deployment
)
```
+### PostgreSQL Source Parameters
+
+| Parameter | Default | Description |
+| ------------- | ------- | ------------------------------------------- |
+| jdbc_url | - | PostgreSQL JDBC connection string |
+| driver_url | - | JDBC driver jar path |
+| driver_class | - | JDBC driver class name |
+| user | - | Database username |
+| password | - | Database password |
+| database | - | Database name |
+| schema | - | Schema name |
+| include_tables| - | Tables to synchronize, comma separated. If not
specified, all tables will be synchronized by default. |
+| offset | initial | initial: full + incremental, latest: incremental
only |
+| snapshot_split_size | 8096 | The size (in number of rows) of each split.
During full synchronization, a table will be divided into multiple splits for
synchronization. |
+| snapshot_parallelism | 1 | The parallelism level during the full
synchronization phase, i.e., the maximum number of splits a single task can
schedule at once. |
+
+## Common Operations
+
### Check Import Status
```sql
@@ -162,7 +189,7 @@ TO DATABASE <target_db> (
| job_name | Job name |
| job_properties | General import parameters |
| comment | Job comment |
-| source_properties | Source (MySQL/PG) parameters|
+| source_properties | Source (MySQL/PostgreSQL) parameters|
| target_properties | Doris target DB parameters |
### Import Parameters
@@ -181,22 +208,6 @@ TO DATABASE <target_db> (
| ------------- | ------- | ------------------------------------------- |
| max_interval | 10s | Idle scheduling interval when no new data |
-#### Source Configuration Parameters
-
-| Parameter | Default | Description |
-| ------------- | ------- | ------------------------------------------- |
-| jdbc_url | - | JDBC connection string (MySQL/PG) |
-| driver_url | - | JDBC driver jar path |
-| driver_class | - | JDBC driver class name |
-| user | - | Database username |
-| password | - | Database password |
-| database | - | Database name |
-| schema | - | Schema name |
-| include_tables| - | Tables to synchronize, comma separated |
-| offset | initial | initial: full + incremental, latest: incremental
only |
-| snapshot_split_size | 8096 | The size (in number of rows) of each split.
During full synchronization, a table will be divided into multiple splits for
synchronization. |
-| snapshot_parallelism | 1 | The parallelism level during the full
synchronization phase, i.e., the maximum number of splits a single task can
schedule at once. |
-
#### Doris Target DB Parameters
| Parameter | Default | Description
|
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/streaming-job/streaming-job-multi-table.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/streaming-job/streaming-job-multi-table.md
index 54e1a9e8c76..0adafb76a9c 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/streaming-job/streaming-job-multi-table.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/streaming-job/streaming-job-multi-table.md
@@ -1,35 +1,28 @@
---
{
- "title": "Postgres/MySQL 持续导入",
+ "title": "MySQL/PostgreSQL 持续导入",
"language": "zh-CN",
- "description": "Doris 可以通过 Streaming Job 的方式,将 MySQL、Postgres
等多张表的全量和增量数据持续同步到 Doris 中。"
+ "description": "Doris 可以通过 Streaming Job 的方式,将 MySQL、PostgreSQL
等多张表的全量和增量数据持续同步到 Doris 中。"
}
---
## 概述
-支持通过 Job 将 MySQL、Postgres 等数据库的多张表的全量和增量数据,通过 Stream Load 的方式持续同步到 Doris
中。适用于需要实时同步多表数据到 Doris 的场景。
+支持通过 Job 将 MySQL、PostgreSQL 等数据库的多张表的全量和增量数据,通过 Stream Load 的方式持续同步到 Doris
中。适用于需要实时同步多表数据到 Doris 的场景。
-## 支持的数据源
-
-- MySQL
-- Postgres
-
-## 基本原理
-
-通过集成 [Flink CDC](https://github.com/apache/flink-cdc) 能力,Doris 支持从
MySQL、Postgres 等数据库读取变更日志,实现多表的全量和增量数据同步。首次同步时会自动创建 Doris 下游表(主键表),并保持主键与上游一致。
+通过集成 [Flink CDC](https://github.com/apache/flink-cdc) 能力,Doris 支持从
MySQL、PostgreSQL 等数据库读取变更日志,实现多表的全量和增量数据同步。首次同步时会自动创建 Doris 下游表
(主键表),并保持主键与上游一致。
**注意事项:**
1. 目前只能保证 at-least-once 语义。
2. 目前只支持主键表同步。
3. 需要 Load 权限,若下游表不存在还需有 Create 权限。
+4. 自动创建表阶段,如果目标表已存在则会跳过,用户可以根据不同的场景自定义表。
-## 快速上手
+## MySQL 持续导入
### 前提条件
-#### MySQL
需要在 MySQL 端开启 Binlog,即 my.cnf 配置文件中增加:
```ini
log-bin=mysql-bin
@@ -37,16 +30,8 @@ binlog_format=ROW
server-id=1
```
-#### Postgres
-需要在 Postgres 端配置逻辑复制,即 postgresql.conf 增加:
-```ini
-wal_level=logical
-```
-
### 创建导入作业
-#### MySQL
-
```sql
CREATE JOB multi_table_sync
ON STREAMING
@@ -61,11 +46,35 @@ FROM MYSQL (
"offset" = "initial"
)
TO DATABASE target_test_db (
- "table.create.properties.replication_num" = "1"
+ "table.create.properties.replication_num" = "1" -- 单BE部署时需要设置为1
)
```
-#### Postgres
+### MySQL 数据源参数
+
+| 参数 | 默认值 | 说明
|
+| -------------- | ------- |
------------------------------------------------------------ |
+| jdbc_url | - | MySQL JDBC 连接串
|
+| driver_url | - | JDBC 驱动 jar 包路径
|
+| driver_class | - | JDBC 驱动类名
|
+| user | - | 数据库用户名
|
+| password | - | 数据库密码
|
+| database | - | 数据库名
|
+| include_tables | - | 需要同步的表名,多个表用逗号分隔 |
+| offset | initial | initial: 全量 + 增量同步,latest: 仅增量同步
|
+| snapshot_split_size | 8096 | split 的大小 (行数),全量同步时,表会被切分成多个 split
进行同步 |
+| snapshot_parallelism | 1 | 全量阶段同步的并行度,即单次 Task 最多调度的 split 数量 |
+
+## PostgreSQL 持续导入
+
+### 前提条件
+
+需要在 PostgreSQL 端配置逻辑复制,即 postgresql.conf 增加:
+```ini
+wal_level=logical
+```
+
+### 创建导入作业
```sql
CREATE JOB test_postgres_job
@@ -82,10 +91,28 @@ FROM POSTGRES (
"offset" = "latest"
)
TO DATABASE target_test_db (
- "table.create.properties.replication_num" = "1"
+ "table.create.properties.replication_num" = "1" -- 单BE部署时需要设置为1
)
```
+### PostgreSQL 数据源参数
+
+| 参数 | 默认值 | 说明
|
+| -------------- | ------- |
------------------------------------------------------------ |
+| jdbc_url | - | PostgreSQL JDBC 连接串
|
+| driver_url | - | JDBC 驱动 jar 包路径
|
+| driver_class | - | JDBC 驱动类名
|
+| user | - | 数据库用户名
|
+| password | - | 数据库密码
|
+| database | - | 数据库名
|
+| schema | - | schema 名称
|
+| include_tables | - | 需要同步的表名,多个表用逗号分隔,不填默认所有的表
|
+| offset | initial | initial: 全量 + 增量同步,latest: 仅增量同步
|
+| snapshot_split_size | 8096 | split 的大小 (行数),全量同步时,表会被切分成多个 split
进行同步 |
+| snapshot_parallelism | 1 | 全量阶段同步的并行度,即单次 Task 最多调度的 split 数量 |
+
+## 通用操作
+
### 查看导入状态
```sql
@@ -142,7 +169,7 @@ DROP JOB where jobName = <job_name> ;
### 导入命令
-创建一个多表同步作业语法如下:
+创建多表同步作业语法如下:
```sql
CREATE JOB <job_name>
@@ -162,7 +189,7 @@ TO DATABASE <target_db> (
| job_name | 任务名 |
| job_properties | 用于指定 Job 的通用导入参数 |
| comment | 用于描述 Job 作业的备注信息 |
-| source_properties | 源端(MySQL/PG 等)相关参数 |
+| source_properties | 源端(MySQL/PostgreSQL 等)相关参数 |
| target_properties | Doris 目标库相关参数 |
### 导入参数
@@ -181,22 +208,6 @@ TO DATABASE <target_db> (
| ------------ | ------ | -------------------------------------- |
| max_interval | 10s | 当上游没有新增数据时,空闲的调度间隔。 |
-#### 数据源配置参数
-
-| 参数 | 默认值 | 说明
|
-| -------------- | ------- |
------------------------------------------------------------ |
-| jdbc_url | - | JDBC 连接串(MySQL/PG)
|
-| driver_url | - | JDBC 驱动 jar 包路径
|
-| driver_class | - | JDBC 驱动类名
|
-| user | - | 数据库用户名
|
-| password | - | 数据库密码
|
-| database | - | 数据库名
|
-| schema | - | schema 名称
|
-| include_tables | - | 需要同步的表名,多个表用逗号分隔 |
-| offset | initial | initial: 全量 + 增量同步,latest: 仅增量同步
|
-| snapshot_split_size | 8096 | split 的大小 (行数),全量同步时,表会被切分成多个 split
进行同步 |
-| snapshot_parallelism | 1 | 全量阶段同步的并行度,即单次 Task 最多调度的 split 数量 |
-
#### Doris 目标库端配置参数
| 参数 | 默认值 | 说明
|
@@ -294,4 +305,6 @@ RunningOffset:
{"endOffset":{"ts_sec":"1765284495","file":"binlog.000002","pos":
| FinishTime | Task 的完成时间 |
| LoadStatistic | Task 的统计信息 |
| User | task 的执行者 |
+| RunningOffset | 当前 Task 同步的 Offset 信息。只有 Job.ExecuteType=Streaming 才有值 |
+| User | task 的执行者 |
| RunningOffset | 当前 Task 同步的 Offset 信息。只有 Job.ExecuteType=Streaming 才有值 |
\ No newline at end of file
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/streaming-job/streaming-job-multi-table.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/streaming-job/streaming-job-multi-table.md
index 54e1a9e8c76..0bf2bce8ad4 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/streaming-job/streaming-job-multi-table.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/streaming-job/streaming-job-multi-table.md
@@ -1,35 +1,28 @@
---
{
- "title": "Postgres/MySQL 持续导入",
+ "title": "MySQL/PostgreSQL 持续导入",
"language": "zh-CN",
- "description": "Doris 可以通过 Streaming Job 的方式,将 MySQL、Postgres
等多张表的全量和增量数据持续同步到 Doris 中。"
+ "description": "Doris 可以通过 Streaming Job 的方式,将 MySQL、PostgreSQL
等多张表的全量和增量数据持续同步到 Doris 中。"
}
---
## 概述
-支持通过 Job 将 MySQL、Postgres 等数据库的多张表的全量和增量数据,通过 Stream Load 的方式持续同步到 Doris
中。适用于需要实时同步多表数据到 Doris 的场景。
+支持通过 Job 将 MySQL、PostgreSQL 等数据库的多张表的全量和增量数据,通过 Stream Load 的方式持续同步到 Doris
中。适用于需要实时同步多表数据到 Doris 的场景。
-## 支持的数据源
-
-- MySQL
-- Postgres
-
-## 基本原理
-
-通过集成 [Flink CDC](https://github.com/apache/flink-cdc) 能力,Doris 支持从
MySQL、Postgres 等数据库读取变更日志,实现多表的全量和增量数据同步。首次同步时会自动创建 Doris 下游表(主键表),并保持主键与上游一致。
+通过集成 [Flink CDC](https://github.com/apache/flink-cdc) 能力,Doris 支持从
MySQL、PostgreSQL 等数据库读取变更日志,实现多表的全量和增量数据同步。首次同步时会自动创建 Doris 下游表
(主键表),并保持主键与上游一致。
**注意事项:**
1. 目前只能保证 at-least-once 语义。
2. 目前只支持主键表同步。
3. 需要 Load 权限,若下游表不存在还需有 Create 权限。
+4. 自动创建表阶段,如果目标表已存在则会跳过,用户可以根据不同的场景自定义表。
-## 快速上手
+## MySQL 持续导入
### 前提条件
-#### MySQL
需要在 MySQL 端开启 Binlog,即 my.cnf 配置文件中增加:
```ini
log-bin=mysql-bin
@@ -37,16 +30,8 @@ binlog_format=ROW
server-id=1
```
-#### Postgres
-需要在 Postgres 端配置逻辑复制,即 postgresql.conf 增加:
-```ini
-wal_level=logical
-```
-
### 创建导入作业
-#### MySQL
-
```sql
CREATE JOB multi_table_sync
ON STREAMING
@@ -61,11 +46,35 @@ FROM MYSQL (
"offset" = "initial"
)
TO DATABASE target_test_db (
- "table.create.properties.replication_num" = "1"
+ "table.create.properties.replication_num" = "1" -- 单 BE 部署时需要设置为 1
)
```
-#### Postgres
+### MySQL 数据源参数
+
+| 参数 | 默认值 | 说明
|
+| -------------- | ------- |
------------------------------------------------------------ |
+| jdbc_url | - | MySQL JDBC 连接串
|
+| driver_url | - | JDBC 驱动 jar 包路径
|
+| driver_class | - | JDBC 驱动类名
|
+| user | - | 数据库用户名
|
+| password | - | 数据库密码
|
+| database | - | 数据库名
|
+| include_tables | - | 需要同步的表名,多个表用逗号分隔 |
+| offset | initial | initial: 全量 + 增量同步,latest: 仅增量同步
|
+| snapshot_split_size | 8096 | split 的大小 (行数),全量同步时,表会被切分成多个 split
进行同步 |
+| snapshot_parallelism | 1 | 全量阶段同步的并行度,即单次 Task 最多调度的 split 数量 |
+
+## PostgreSQL 持续导入
+
+### 前提条件
+
+需要在 PostgreSQL 端配置逻辑复制,即 postgresql.conf 增加:
+```ini
+wal_level=logical
+```
+
+### 创建导入作业
```sql
CREATE JOB test_postgres_job
@@ -82,10 +91,28 @@ FROM POSTGRES (
"offset" = "latest"
)
TO DATABASE target_test_db (
- "table.create.properties.replication_num" = "1"
+ "table.create.properties.replication_num" = "1" -- 单 BE 部署时需要设置为 1
)
```
+### PostgreSQL 数据源参数
+
+| 参数 | 默认值 | 说明
|
+| -------------- | ------- |
------------------------------------------------------------ |
+| jdbc_url | - | PostgreSQL JDBC 连接串
|
+| driver_url | - | JDBC 驱动 jar 包路径
|
+| driver_class | - | JDBC 驱动类名
|
+| user | - | 数据库用户名
|
+| password | - | 数据库密码
|
+| database | - | 数据库名
|
+| schema | - | schema 名称
|
+| include_tables | - | 需要同步的表名,多个表用逗号分隔,不填默认所有的表
|
+| offset | initial | initial: 全量 + 增量同步,latest: 仅增量同步
|
+| snapshot_split_size | 8096 | split 的大小 (行数),全量同步时,表会被切分成多个 split
进行同步 |
+| snapshot_parallelism | 1 | 全量阶段同步的并行度,即单次 Task 最多调度的 split 数量 |
+
+## 通用操作
+
### 查看导入状态
```sql
@@ -142,7 +169,7 @@ DROP JOB where jobName = <job_name> ;
### 导入命令
-创建一个多表同步作业语法如下:
+创建多表同步作业语法如下:
```sql
CREATE JOB <job_name>
@@ -162,7 +189,7 @@ TO DATABASE <target_db> (
| job_name | 任务名 |
| job_properties | 用于指定 Job 的通用导入参数 |
| comment | 用于描述 Job 作业的备注信息 |
-| source_properties | 源端(MySQL/PG 等)相关参数 |
+| source_properties | 源端(MySQL/PostgreSQL 等)相关参数 |
| target_properties | Doris 目标库相关参数 |
### 导入参数
@@ -181,22 +208,6 @@ TO DATABASE <target_db> (
| ------------ | ------ | -------------------------------------- |
| max_interval | 10s | 当上游没有新增数据时,空闲的调度间隔。 |
-#### 数据源配置参数
-
-| 参数 | 默认值 | 说明
|
-| -------------- | ------- |
------------------------------------------------------------ |
-| jdbc_url | - | JDBC 连接串(MySQL/PG)
|
-| driver_url | - | JDBC 驱动 jar 包路径
|
-| driver_class | - | JDBC 驱动类名
|
-| user | - | 数据库用户名
|
-| password | - | 数据库密码
|
-| database | - | 数据库名
|
-| schema | - | schema 名称
|
-| include_tables | - | 需要同步的表名,多个表用逗号分隔 |
-| offset | initial | initial: 全量 + 增量同步,latest: 仅增量同步
|
-| snapshot_split_size | 8096 | split 的大小 (行数),全量同步时,表会被切分成多个 split
进行同步 |
-| snapshot_parallelism | 1 | 全量阶段同步的并行度,即单次 Task 最多调度的 split 数量 |
-
#### Doris 目标库端配置参数
| 参数 | 默认值 | 说明
|
diff --git
a/versioned_docs/version-4.x/data-operate/import/streaming-job/streaming-job-multi-table.md
b/versioned_docs/version-4.x/data-operate/import/streaming-job/streaming-job-multi-table.md
index 5ac0d6ea910..f706443e3cc 100644
---
a/versioned_docs/version-4.x/data-operate/import/streaming-job/streaming-job-multi-table.md
+++
b/versioned_docs/version-4.x/data-operate/import/streaming-job/streaming-job-multi-table.md
@@ -1,35 +1,28 @@
---
{
- "title": "Postgres/MySQL Continuous Load",
+ "title": "MySQL/PostgreSQL Continuous Load",
"language": "en",
- "description": "Doris can continuously synchronize full and incremental data
from multiple tables in MySQL, Postgres, etc. to Doris using Streaming Job."
+ "description": "Doris can continuously synchronize full and incremental data
from multiple tables in MySQL, PostgreSQL, etc. to Doris using Streaming Job."
}
---
## Overview
-Supports using Job to continuously synchronize full and incremental data from
multiple tables in MySQL, Postgres, etc. to Doris via Streaming Job. Suitable
for scenarios requiring real-time multi-table data synchronization to Doris.
+Supports using Job to continuously synchronize full and incremental data from
multiple tables in MySQL, PostgreSQL, etc. to Doris via Streaming Job. Suitable
for scenarios requiring real-time multi-table data synchronization to Doris.
-## Supported Data Sources
-
-- MySQL
-- Postgres
-
-## Basic Principles
-
-By integrating [Flink CDC](https://github.com/apache/flink-cdc), Doris
supports reading change logs from MySQL, Postgres, etc., enabling full and
incremental multi-table data synchronization. When synchronizing for the first
time, Doris automatically creates downstream tables (primary key tables) and
keeps the primary key consistent with the upstream.
+By integrating [Flink CDC](https://github.com/apache/flink-cdc), Doris
supports reading change logs from MySQL, PostgreSQL, etc., enabling full and
incremental multi-table data synchronization. When synchronizing for the first
time, Doris automatically creates downstream tables (primary key tables) and
keeps the primary key consistent with the upstream.
**Notes:**
1. Currently only at-least-once semantics are guaranteed.
2. Only primary key tables are supported for synchronization.
3. LOAD privilege is required. If the downstream table does not exist, CREATE
privilege is also required.
+4. During automatic table creation, if the target table already exists, it
will be skipped, and users can customize tables according to different
scenarios.
-## Quick Start
+## MySQL Continuous Load
### Prerequisites
-#### MySQL
Enable Binlog on MySQL by adding the following to my.cnf:
```ini
log-bin=mysql-bin
@@ -37,16 +30,8 @@ binlog_format=ROW
server-id=1
```
-#### Postgres
-Enable logical replication on Postgres by adding the following to
postgresql.conf:
-```ini
-wal_level=logical
-```
-
### Creating an Import Job
-#### MySQL
-
```sql
CREATE JOB multi_table_sync
ON STREAMING
@@ -61,11 +46,35 @@ FROM MYSQL (
"offset" = "initial"
)
TO DATABASE target_test_db (
- "table.create.properties.replication_num" = "1"
+ "table.create.properties.replication_num" = "1" -- Set to 1 for single BE
deployment
)
```
-#### Postgres
+### MySQL Source Parameters
+
+| Parameter | Default | Description |
+| ------------- | ------- | ------------------------------------------- |
+| jdbc_url | - | MySQL JDBC connection string |
+| driver_url | - | JDBC driver jar path |
+| driver_class | - | JDBC driver class name |
+| user | - | Database username |
+| password | - | Database password |
+| database | - | Database name |
+| include_tables| - | Tables to synchronize, comma separated |
+| offset | initial | initial: full + incremental, latest: incremental
only |
+| snapshot_split_size | 8096 | The size (in number of rows) of each split.
During full synchronization, a table will be divided into multiple splits for
synchronization. |
+| snapshot_parallelism | 1 | The parallelism level during the full
synchronization phase, i.e., the maximum number of splits a single task can
schedule at once. |
+
+## PostgreSQL Continuous Load
+
+### Prerequisites
+
+Enable logical replication on PostgreSQL by adding the following to
postgresql.conf:
+```ini
+wal_level=logical
+```
+
+### Creating an Import Job
```sql
CREATE JOB test_postgres_job
@@ -82,10 +91,28 @@ FROM POSTGRES (
"offset" = "latest"
)
TO DATABASE target_test_db (
- "table.create.properties.replication_num" = "1"
+ "table.create.properties.replication_num" = "1" -- Set to 1 for single BE
deployment
)
```
+### PostgreSQL Source Parameters
+
+| Parameter | Default | Description |
+| ------------- | ------- | ------------------------------------------- |
+| jdbc_url | - | PostgreSQL JDBC connection string |
+| driver_url | - | JDBC driver jar path |
+| driver_class | - | JDBC driver class name |
+| user | - | Database username |
+| password | - | Database password |
+| database | - | Database name |
+| schema | - | Schema name |
+| include_tables| - | Tables to synchronize, comma separated. If not
specified, all tables will be synchronized by default. |
+| offset | initial | initial: full + incremental, latest: incremental
only |
+| snapshot_split_size | 8096 | The size (in number of rows) of each split.
During full synchronization, a table will be divided into multiple splits for
synchronization. |
+| snapshot_parallelism | 1 | The parallelism level during the full
synchronization phase, i.e., the maximum number of splits a single task can
schedule at once. |
+
+## Common Operations
+
### Check Import Status
```sql
@@ -162,7 +189,7 @@ TO DATABASE <target_db> (
| job_name | Job name |
| job_properties | General import parameters |
| comment | Job comment |
-| source_properties | Source (MySQL/PG) parameters|
+| source_properties | Source (MySQL/PostgreSQL) parameters|
| target_properties | Doris target DB parameters |
### Import Parameters
@@ -181,22 +208,6 @@ TO DATABASE <target_db> (
| ------------- | ------- | ------------------------------------------- |
| max_interval | 10s | Idle scheduling interval when no new data |
-#### Source Configuration Parameters
-
-| Parameter | Default | Description |
-| ------------- | ------- | ------------------------------------------- |
-| jdbc_url | - | JDBC connection string (MySQL/PG) |
-| driver_url | - | JDBC driver jar path |
-| driver_class | - | JDBC driver class name |
-| user | - | Database username |
-| password | - | Database password |
-| database | - | Database name |
-| schema | - | Schema name |
-| include_tables| - | Tables to synchronize, comma separated |
-| offset | initial | initial: full + incremental, latest: incremental
only |
-| snapshot_split_size | 8096 | The size (in number of rows) of each split.
During full synchronization, a table will be divided into multiple splits for
synchronization. |
-| snapshot_parallelism | 1 | The parallelism level during the full
synchronization phase, i.e., the maximum number of splits a single task can
schedule at once. |
-
#### Doris Target DB Parameters
| Parameter | Default | Description
|
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]