This is an automated email from the ASF dual-hosted git repository.
JNSimba pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new ffd38bb7cdb [fix](doc) fix streaming job syntax and add data type
mapping for dev & 4.x (#3933)
ffd38bb7cdb is described below
commit ffd38bb7cdb2161a1aa6ffd7514039b5fe8bdd88
Author: wudi <[email protected]>
AuthorDate: Mon Jun 15 18:49:25 2026 +0800
[fix](doc) fix streaming job syntax and add data type mapping for dev & 4.x
(#3933)
## Summary
Fix incorrect Streaming Job (CDC continuous load) syntax in the docs and
add a dedicated data type mapping reference page. Applied to both **dev
(current)** and **version-4.x**, EN + ZH.
## Syntax fixes (verified against the SQL parser grammar and regression
cases)
- **`PROPERTIES` placement**: must come **before** `ON STREAMING` (right
after the job name), not after. The previous order would fail to parse.
- **`max_interval`**: an integer number of **seconds** (e.g. `10`); a
unit suffix like `10s` is not accepted.
- **`s3.max_batch_bytes`**: a plain **byte count** (e.g. `10737418240`,
valid range 100MB–10GB); `10G` is not accepted.
- **`offset`**:
- Multi-table CDC (`FROM ... TO DATABASE`) defaults to `latest` (was
documented as `initial`).
- `cdc_stream` TVF requires `offset` (it has no default); fixed the
param tables and the examples that omitted it (and a trailing-comma
issue in the zh examples).
## Docs improvements
- **Continuous Load Overview**: add a **Limitations** section
(primary-key tables only; schema-change/DDL sync scope) placed just
above the FAQ, and link to the new mapping page.
- **New page – Data Type Mapping**: how upstream MySQL/PostgreSQL column
types map to Doris column types, with value-representation notes
(unsigned widening, JSON, Base64-encoded binary, timezone normalization,
GeoJSON, one-dimensional arrays, etc.). Registered in the sidebars.
No behavior change — documentation only.
---
.../continuous-load-mysql-database.md | 4 +-
.../streaming-job/continuous-load-mysql-table.md | 2 +-
.../streaming-job/continuous-load-overview.md | 21 ++++++-
.../continuous-load-postgresql-database.md | 4 +-
.../continuous-load-postgresql-table.md | 2 +-
.../import-way/streaming-job/continuous-load-s3.md | 6 +-
.../import-way/streaming-job/data-type-mapping.md | 71 ++++++++++++++++++++++
.../table-valued-functions/cdc-stream.md | 2 +-
.../sql-statements/job/CREATE-STREAMING-JOB.md | 16 ++---
.../continuous-load-mysql-database.md | 4 +-
.../streaming-job/continuous-load-mysql-table.md | 2 +-
.../streaming-job/continuous-load-overview.md | 21 ++++++-
.../continuous-load-postgresql-database.md | 4 +-
.../continuous-load-postgresql-table.md | 2 +-
.../import-way/streaming-job/continuous-load-s3.md | 6 +-
.../import-way/streaming-job/data-type-mapping.md | 71 ++++++++++++++++++++++
.../table-valued-functions/cdc-stream.md | 6 ++
.../sql-statements/job/CREATE-STREAMING-JOB.md | 20 +++---
.../continuous-load-mysql-database.md | 4 +-
.../streaming-job/continuous-load-mysql-table.md | 2 +-
.../streaming-job/continuous-load-overview.md | 21 ++++++-
.../continuous-load-postgresql-database.md | 4 +-
.../continuous-load-postgresql-table.md | 2 +-
.../import-way/streaming-job/continuous-load-s3.md | 6 +-
.../import-way/streaming-job/data-type-mapping.md | 71 ++++++++++++++++++++++
.../table-valued-functions/cdc-stream.md | 6 ++
.../sql-statements/job/CREATE-STREAMING-JOB.md | 20 +++---
sidebars.ts | 1 +
.../continuous-load-mysql-database.md | 4 +-
.../streaming-job/continuous-load-mysql-table.md | 2 +-
.../streaming-job/continuous-load-overview.md | 21 ++++++-
.../continuous-load-postgresql-database.md | 4 +-
.../continuous-load-postgresql-table.md | 2 +-
.../import-way/streaming-job/continuous-load-s3.md | 6 +-
.../import-way/streaming-job/data-type-mapping.md | 71 ++++++++++++++++++++++
.../table-valued-functions/cdc-stream.md | 2 +-
.../sql-statements/job/CREATE-STREAMING-JOB.md | 16 ++---
versioned_sidebars/version-4.x-sidebars.json | 3 +-
38 files changed, 457 insertions(+), 75 deletions(-)
diff --git
a/docs/data-operate/import/import-way/streaming-job/continuous-load-mysql-database.md
b/docs/data-operate/import/import-way/streaming-job/continuous-load-mysql-database.md
index c34d0660e8a..0b52d267d4f 100644
---
a/docs/data-operate/import/import-way/streaming-job/continuous-load-mysql-database.md
+++
b/docs/data-operate/import/import-way/streaming-job/continuous-load-mysql-database.md
@@ -144,7 +144,7 @@ The MySQL source side (`FROM MYSQL`) supports the following
parameters:
| password | - | Database password
|
| database | - | Database name
|
| include_tables | - | Names of tables to sync, separated by
commas. If left empty, all tables are synced by default |
-| offset | initial | `initial`: full + incremental sync;
`latest`: incremental sync only
|
+| offset | latest | `latest`: incremental sync only; `initial`:
full + incremental sync |
| snapshot_split_size | 8096 | Size of a split (in rows). During full
sync, a table is divided into multiple splits for sync |
| snapshot_parallelism | 1 | Parallelism during the full-sync phase,
that is, the maximum number of splits a single Task can schedule at a time |
@@ -158,8 +158,8 @@ The syntax for creating an Auto Table Creation Sync job is
as follows:
```sql
CREATE JOB <job_name>
-ON STREAMING
[job_properties]
+ON STREAMING
[ COMMENT <comment> ]
FROM MYSQL (
[source_properties]
diff --git
a/docs/data-operate/import/import-way/streaming-job/continuous-load-mysql-table.md
b/docs/data-operate/import/import-way/streaming-job/continuous-load-mysql-table.md
index 01f010f9552..797aead44ca 100644
---
a/docs/data-operate/import/import-way/streaming-job/continuous-load-mysql-table.md
+++
b/docs/data-operate/import/import-way/streaming-job/continuous-load-mysql-table.md
@@ -92,7 +92,7 @@ The MySQL data source parameters supported by CDC Stream TVF
are as follows:
| password | - | Database password
|
| database | - | Database name
|
| table | - | Name of the table to sync
|
-| offset | initial | `initial`: full + incremental sync;
`latest`: incremental sync only
|
+| offset | latest | `latest`: incremental sync only;
`initial`: full + incremental sync
|
| snapshot_split_size | 8096 | Size of a split (in rows). During full
sync, the table is divided into multiple splits for syncing
|
| snapshot_parallelism | 1 | Parallelism during the full sync stage,
that is, the maximum number of splits scheduled per Task
|
diff --git
a/docs/data-operate/import/import-way/streaming-job/continuous-load-overview.md
b/docs/data-operate/import/import-way/streaming-job/continuous-load-overview.md
index d82cba73319..9b03d700601 100644
---
a/docs/data-operate/import/import-way/streaming-job/continuous-load-overview.md
+++
b/docs/data-operate/import/import-way/streaming-job/continuous-load-overview.md
@@ -46,6 +46,8 @@ Continuous load supports the following data sources and sync
modes:
| PostgreSQL | 14, 15, 16, 17 | [PostgreSQL CDC with SQL
Mapping](./continuous-load-postgresql-table.md) | [PostgreSQL CDC with Auto
Table Creation](./continuous-load-postgresql-database.md) | [Amazon RDS
PostgreSQL](./prerequisites/amazon-rds-postgresql.md) · [Amazon Aurora
PostgreSQL](./prerequisites/amazon-aurora-postgresql.md) |
| S3 | - | [S3 Continuous
Load](./continuous-load-s3.md) | -
| -
|
+For how upstream column types map to Doris types, see [Data Type
Mapping](./data-type-mapping.md).
+
## How to Choose a Sync Method
<!-- Knowledge type: Architecture selection decision -->
@@ -197,7 +199,24 @@ DROP JOB WHERE jobName = <job_name>;
| Parameter | Default | Description
|
| ------------ | ------- |
------------------------------------------------------ |
-| max_interval | 10s | Idle scheduling interval when the upstream has no
new data. |
+| max_interval | 10 | Idle scheduling interval in seconds when the
upstream has no new data. Only an integer (number of seconds) is accepted, e.g.
`10`; a unit suffix such as `10s` is not supported. Must be >= 1. |
+
+## Limitations
+
+<!-- Knowledge type: Limitations / unsupported boundaries -->
+
+Sync scope, automatic table creation, and semantic guarantees (exactly-once /
at-least-once) are described in [Capability
Comparison](#capability-comparison). This section only lists the constraints
and behaviors that are **not supported**.
+
+### Primary Key Tables
+
+Only upstream tables **with a primary key** can be synchronized (both sync
methods). The corresponding Doris table is a **Unique Key** table —
auto-created as such in Auto Table Creation Sync, or created by you in SQL
Mapping Sync. Tables without a primary key are not supported.
+
+### Schema Change (DDL)
+
+DDL sync applies **only to Auto Table Creation Sync**; SQL Mapping (TVF) does
not sync any DDL.
+
+- **PostgreSQL** (supported since 4.1): only `ADD COLUMN` and `DROP COLUMN`
are synced. **Column type changes, `RENAME COLUMN`, and constraint / index /
partition changes are NOT synced** — apply them manually in Doris.
+- **MySQL**: upstream DDL is **not synced yet** — adjust the Doris table
schema manually.
## FAQ
diff --git
a/docs/data-operate/import/import-way/streaming-job/continuous-load-postgresql-database.md
b/docs/data-operate/import/import-way/streaming-job/continuous-load-postgresql-database.md
index cd81fef3613..6e597b9394d 100644
---
a/docs/data-operate/import/import-way/streaming-job/continuous-load-postgresql-database.md
+++
b/docs/data-operate/import/import-way/streaming-job/continuous-load-postgresql-database.md
@@ -95,7 +95,7 @@ PostgreSQL source parameters configure the JDBC connection,
sync scope, and full
| database | - | Database name
|
| schema | - | Schema name
|
| include_tables | - | Tables to sync, separated by commas. If
left empty, all tables are synced by default
|
-| offset | initial | initial: full + incremental sync; latest:
incremental-only sync
|
+| offset | latest | latest: incremental-only sync; initial:
full + incremental sync
|
| snapshot_split_size | 8096 | Size of a split (in rows). During full
sync, a table is divided into multiple splits for synchronization
|
| snapshot_parallelism | 1 | Parallelism during the full-sync phase,
that is, the maximum number of splits scheduled in a single Task
|
@@ -119,8 +119,8 @@ The syntax for creating an Auto Table Creation Sync job is
as follows:
```sql
CREATE JOB <job_name>
-ON STREAMING
[job_properties]
+ON STREAMING
[ COMMENT <comment> ]
FROM POSTGRES (
[source_properties]
diff --git
a/docs/data-operate/import/import-way/streaming-job/continuous-load-postgresql-table.md
b/docs/data-operate/import/import-way/streaming-job/continuous-load-postgresql-table.md
index 623679d1424..a9bc4d8042b 100644
---
a/docs/data-operate/import/import-way/streaming-job/continuous-load-postgresql-table.md
+++
b/docs/data-operate/import/import-way/streaming-job/continuous-load-postgresql-table.md
@@ -102,7 +102,7 @@ Data source parameters are configured through the
`cdc_stream(...)` TVF. They de
| database | - | Database name.
|
| schema | - | Schema name.
|
| table | - | Name of the table to sync.
|
-| offset | initial | `initial`: full + incremental sync.
`latest`: incremental sync only.
|
+| offset | latest | `latest`: incremental sync only. `initial`:
full + incremental sync.
|
| snapshot_split_size | 8096 | Split size in rows. During full sync, the
table is divided into multiple splits for syncing.
|
| snapshot_parallelism | 1 | Parallelism for the full sync stage, that
is, the maximum number of splits scheduled per Task.
|
diff --git
a/docs/data-operate/import/import-way/streaming-job/continuous-load-s3.md
b/docs/data-operate/import/import-way/streaming-job/continuous-load-s3.md
index e830f867ff7..cca869dc64c 100644
--- a/docs/data-operate/import/import-way/streaming-job/continuous-load-s3.md
+++ b/docs/data-operate/import/import-way/streaming-job/continuous-load-s3.md
@@ -155,8 +155,8 @@ The syntax for creating an S3 TVF continuous load job is as
follows:
```SQL
CREATE JOB <job_name>
-ON STREAMING
[job_properties]
+ON STREAMING
[ COMMENT <comment> ]
DO <Insert_Command>
```
@@ -178,8 +178,8 @@ The following parameters can be configured in
`job_properties`:
| ------------------ | ------- |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
| session.\* | None | All session variables can be configured under
`job_properties`. For load variables, see [Insert Into
Select](../../../../data-operate/import/import-way/insert-into-manual.md#import-configuration-parameters)
|
| s3.max_batch_files | 256 | Triggers a load write when the cumulative
number of files reaches this value
|
-| s3.max_batch_bytes | 10G | Triggers a load write when the cumulative
data volume reaches this value
|
-| max_interval | 10s | The idle scheduling interval when there are
no new files or data from upstream
|
+| s3.max_batch_bytes | 10737418240 | Triggers a load write when the cumulative
data volume (in bytes) reaches this value. Only a plain integer byte count is
accepted, e.g. `10737418240`; a unit suffix such as `10G` is not supported.
Valid range is 100MB–10GB, i.e. 104857600–10737418240
|
+| max_interval | 10 | The idle scheduling interval in seconds when
there are no new files or data from upstream. Only an integer (number of
seconds) is accepted, e.g. `10`; a unit suffix such as `10s` is not supported.
Must be >= 1 |
:::tip Batch trigger rule
A write is triggered when either `s3.max_batch_files` or `s3.max_batch_bytes`
is satisfied. When there are no new files from upstream, idle polling proceeds
according to `max_interval`.
diff --git
a/docs/data-operate/import/import-way/streaming-job/data-type-mapping.md
b/docs/data-operate/import/import-way/streaming-job/data-type-mapping.md
new file mode 100644
index 00000000000..a1a6ee06daa
--- /dev/null
+++ b/docs/data-operate/import/import-way/streaming-job/data-type-mapping.md
@@ -0,0 +1,71 @@
+---
+{
+ "title": "Data Type Mapping",
+ "language": "en",
+ "description": "Data type mapping for Doris Streaming Job CDC: how
upstream MySQL and PostgreSQL types map to Doris column types during automatic
table creation, and how values are converted when written into Doris."
+}
+---
+
+<!-- Knowledge type: Reference / data type mapping -->
+
+The tables below show how each upstream MySQL or PostgreSQL column type maps
to a Doris column type. There is a single mapping, shared by both sync methods
— the only difference is who creates the target table:
+
+- **Auto Table Creation Sync**: Doris reads the upstream schema and creates
the target table with these Doris column types automatically.
+- **SQL Mapping Sync**: you create the target table yourself, using a
compatible Doris type for each column.
+
+In both cases, the values are written following the same mapping. For types
that are not supported, see
[Limitations](./continuous-load-overview.md#limitations).
+
+## MySQL to Doris
+
+| MySQL Type | Doris Type | Notes |
+| --- | --- | --- |
+| `BOOLEAN` / `TINYINT(1)` | `BOOLEAN` | |
+| `TINYINT` | `TINYINT` | `UNSIGNED` → `SMALLINT` |
+| `SMALLINT` | `SMALLINT` | `UNSIGNED` → `INT` |
+| `MEDIUMINT` | `INT` | `UNSIGNED` → `INT` |
+| `INT` | `INT` | `UNSIGNED` → `BIGINT` |
+| `BIGINT` | `BIGINT` | `UNSIGNED` → `LARGEINT` |
+| `YEAR` | `SMALLINT` | |
+| `FLOAT` | `FLOAT` | |
+| `DOUBLE` | `DOUBLE` | |
+| `DECIMAL(p,s)` | `DECIMAL(p,s)` | Very high precision falls back to `STRING`
|
+| `DATE` | `DATE` | |
+| `DATETIME` | `DATETIME(s)` | Microsecond scale (0–6) preserved |
+| `TIMESTAMP` | `DATETIME(s)` | Normalized to the server time zone |
+| `TIME` | `STRING` | Serialized as a string, e.g. `12:34:56.000000` |
+| `CHAR` | `CHAR` | |
+| `VARCHAR` | `VARCHAR` | |
+| `TINYTEXT` / `TEXT` / `MEDIUMTEXT` / `LONGTEXT` | `STRING` | |
+| `BINARY` / `VARBINARY` / `TINYBLOB` / `BLOB` / `MEDIUMBLOB` / `LONGBLOB` |
`STRING` | Base64-encoded string |
+| `BIT(1)` | `BOOLEAN` | |
+| `BIT(n>1)` | `STRING` | Base64-encoded string |
+| `JSON` | `STRING` | Stored as JSON text |
+| `ENUM` | `STRING` | Resolved to the label value |
+| `SET` | `STRING` | Resolved to comma-separated labels |
+| Spatial types (`GEOMETRY`, `POINT`, …) and other types | Not supported |
Auto Table Creation fails for tables with such columns; use SQL Mapping Sync
(cast in `SELECT`) or exclude the column |
+
+## PostgreSQL to Doris
+
+| PostgreSQL Type | Doris Type | Notes |
+| --- | --- | --- |
+| `bool` | `BOOLEAN` | |
+| `bit(1)` | `BOOLEAN` | |
+| `bit(n>1)` / `varbit` | `STRING` | |
+| `int2` / `smallserial` | `SMALLINT` | |
+| `int4` / `serial` | `INT` | |
+| `int8` / `bigserial` | `BIGINT` | |
+| `float4` | `FLOAT` | |
+| `float8` | `DOUBLE` | |
+| `numeric(p,s)` | `DECIMAL(min(p,38), s)` | Precision capped at 38; scale
defaults to 9 when unspecified |
+| `bpchar(n)` (i.e. `CHAR(n)`) | `CHAR` / `VARCHAR` | Length scaled ×3 for
UTF-8; `> 255` becomes `VARCHAR` |
+| `varchar` / `text` | `STRING` | PostgreSQL `varchar` maps to `STRING` (no
length) |
+| `date` | `DATE` | |
+| `timestamp` / `timestamptz` | `DATETIME(s)` | Scale 0–6; `timestamptz`
normalized to the server time zone |
+| `time` / `timetz` / `interval` | `STRING` | `timetz` keeps its
UTC-normalized offset |
+| `uuid` / `inet` / `cidr` / `macaddr` / `macaddr8` | `STRING` | |
+| `bytea` | `STRING` | Base64-encoded string |
+| `xml` / `hstore` | `STRING` | |
+| `json` / `jsonb` | `JSON` | |
+| Geometry types (`point`, `line`, `polygon`, …) | `STRING` | Serialized as a
GeoJSON string (`type` / `coordinates` / `srid`) |
+| Array types (e.g. `int4[]`, `text[]`) | `ARRAY<T>` | One-dimensional only |
+| Other / user-defined types | `STRING` | |
diff --git a/docs/sql-manual/sql-functions/table-valued-functions/cdc-stream.md
b/docs/sql-manual/sql-functions/table-valued-functions/cdc-stream.md
index e96fa2521da..41741eb945c 100644
--- a/docs/sql-manual/sql-functions/table-valued-functions/cdc-stream.md
+++ b/docs/sql-manual/sql-functions/table-valued-functions/cdc-stream.md
@@ -49,7 +49,7 @@ cdc_stream(
| Parameter | Default | Description
|
|------------------------|---------|--------------------------------------------------------------|
| `schema` | - | Schema name, required for PostgreSQL
|
-| `offset` | initial | `initial`: full + incremental sync;
`latest`: incremental only |
+| `offset` | latest | `latest`: incremental only; `initial`:
full + incremental; also supports `snapshot`, `earliest` (MySQL only), or a
JSON binlog/LSN position |
| `snapshot_split_size` | 8096 | Split size (in rows). During full sync,
the table is divided into multiple splits |
| `snapshot_parallelism` | 1 | Parallelism during full sync phase, i.e.,
max splits per task |
diff --git a/docs/sql-manual/sql-statements/job/CREATE-STREAMING-JOB.md
b/docs/sql-manual/sql-statements/job/CREATE-STREAMING-JOB.md
index 5fa42fd6fbd..dc5b097ccdf 100644
--- a/docs/sql-manual/sql-statements/job/CREATE-STREAMING-JOB.md
+++ b/docs/sql-manual/sql-statements/job/CREATE-STREAMING-JOB.md
@@ -23,12 +23,12 @@ For detailed usage, see [Continuous
Load](../../../data-operate/import/import-wa
```SQL
CREATE JOB <job_name>
-ON STREAMING
[ PROPERTIES (
<job_property>
[ , ... ]
)
]
+ON STREAMING
[ COMMENT <comment> ]
DO <Insert_Command>
```
@@ -37,12 +37,12 @@ DO <Insert_Command>
```SQL
CREATE JOB <job_name>
-ON STREAMING
[ PROPERTIES (
<job_property>
[ , ... ]
)
]
+ON STREAMING
[ COMMENT <comment> ]
FROM <sourceType> (
<source_property>
@@ -77,7 +77,7 @@ TO DATABASE <target_db> (
| database | - | Database name
|
| schema | - | Schema name (PostgreSQL)
|
| include_tables | - | Tables to synchronize, comma separated. If not
specified, all tables will be synchronized by default. |
-| offset | initial | initial: full + incremental sync, latest:
incremental only |
+| offset | latest | `latest`: incremental only (default); `initial`:
full + incremental sync |
| snapshot_split_size | 8096 | Split size (in rows). During full sync, the
table is divided into multiple splits. |
| snapshot_parallelism | 1 | Parallelism during full sync phase, i.e., max
splits per task. |
@@ -100,8 +100,8 @@ TO DATABASE <target_db> (
| ------------------ | ------- |
------------------------------------------------------------ |
| session.* | - | Supports configuring all session variables in
job_properties (TVF Mode only) |
| s3.max_batch_files | 256 | Triggers an import write when the accumulated
file count reaches this value (S3 TVF only) |
-| s3.max_batch_bytes | 10G | Triggers an import write when the accumulated
data size reaches this value (S3 TVF only) |
-| max_interval | 10s | Idle scheduling interval when there are no
new files or data upstream |
+| s3.max_batch_bytes | 10737418240 | Triggers an import write when the
accumulated data size (in bytes) reaches this value. Only a plain integer byte
count is accepted, e.g. `10737418240`; a unit suffix such as `10G` is not
supported. Valid range is 100MB–10GB, i.e. 104857600–10737418240. (S3 TVF only)
|
+| max_interval | 10 | Idle scheduling interval in seconds when
there are no new files or data upstream. Only an integer (number of seconds) is
accepted, e.g. `10`; a unit suffix such as `10s` is not supported. Must be >= 1
|
## Privilege Control
@@ -153,7 +153,8 @@ The user executing this SQL command must have at least the
following privileges:
"user" = "root",
"password" = "123456",
"database" = "source_db",
- "table" = "source_table"
+ "table" = "source_table",
+ "offset" = "initial"
);
```
@@ -173,7 +174,8 @@ The user executing this SQL command must have at least the
following privileges:
"password" = "postgres",
"database" = "postgres",
"schema" = "public",
- "table" = "source_table"
+ "table" = "source_table",
+ "offset" = "initial"
);
```
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/streaming-job/continuous-load-mysql-database.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/streaming-job/continuous-load-mysql-database.md
index 4c4a04e51d7..1071d5991ca 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/streaming-job/continuous-load-mysql-database.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/streaming-job/continuous-load-mysql-database.md
@@ -144,7 +144,7 @@ MySQL 源端 (`FROM MYSQL`) 支持的参数如下:
| password | - | 数据库密码
|
| database | - | 数据库名
|
| include_tables | - | 需要同步的表名,多个表用逗号分隔,不填默认所有的表
|
-| offset | initial | `initial`:全量 + 增量同步;`latest`:仅增量同步
|
+| offset | latest | `latest`:仅增量同步;`initial`:全量 + 增量同步
|
| snapshot_split_size | 8096 | split 的大小(行数),全量同步时,表会被切分成多个 split 进行同步
|
| snapshot_parallelism | 1 | 全量阶段同步的并行度,即单次 Task 最多调度的 split 数量
|
@@ -158,8 +158,8 @@ MySQL 源端 (`FROM MYSQL`) 支持的参数如下:
```sql
CREATE JOB <job_name>
-ON STREAMING
[job_properties]
+ON STREAMING
[ COMMENT <comment> ]
FROM MYSQL (
[source_properties]
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/streaming-job/continuous-load-mysql-table.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/streaming-job/continuous-load-mysql-table.md
index 7807d76f337..d4c7abb22c9 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/streaming-job/continuous-load-mysql-table.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/streaming-job/continuous-load-mysql-table.md
@@ -92,7 +92,7 @@ CDC Stream TVF 支持的 MySQL 数据源参数如下:
| password | - | 数据库密码
|
| database | - | 数据库名
|
| table | - | 需要同步的表名
|
-| offset | initial | `initial`:全量 + 增量同步;`latest`:仅增量同步
|
+| offset | latest | `latest`:仅增量同步;`initial`:全量 + 增量同步
|
| snapshot_split_size | 8096 | split 的大小(行数)。全量同步时,表会被切分成多个 split 进行同步
|
| snapshot_parallelism | 1 | 全量阶段同步的并行度,即单次 Task 最多调度的 split 数量
|
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/streaming-job/continuous-load-overview.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/streaming-job/continuous-load-overview.md
index 4943a2633f7..a22e3eb02b8 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/streaming-job/continuous-load-overview.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/streaming-job/continuous-load-overview.md
@@ -46,6 +46,8 @@ Doris 支持通过 **Streaming Job** 的方式,从多种数据源持续导入
| PostgreSQL | 14、15、16、17 | [PostgreSQL CDC SQL
映射同步](./continuous-load-postgresql-table.md) | [PostgreSQL CDC
自动建表同步](./continuous-load-postgresql-database.md) | [Amazon RDS
PostgreSQL](./prerequisites/amazon-rds-postgresql.md) · [Amazon Aurora
PostgreSQL](./prerequisites/amazon-aurora-postgresql.md) |
| S3 | - | [S3 持续导入](./continuous-load-s3.md)
| -
| -
|
+上游列类型如何映射为 Doris 类型,见[数据类型映射](./data-type-mapping.md)。
+
## 如何选择同步方式
<!-- 知识类型: 架构选型决策 -->
@@ -197,7 +199,24 @@ DROP JOB WHERE jobName = <job_name>;
| 参数 | 默认值 | 说明 |
| ------------ | ------ | -------------------------------------- |
-| max_interval | 10s | 当上游没有新增数据时,空闲的调度间隔。 |
+| max_interval | 10 | 当上游没有新增数据时,空闲的调度间隔,单位为秒。只接受整数(秒数),如 `10`;不支持带单位后缀(如
`10s`)。取值需 >= 1。 |
+
+## 使用限制
+
+<!-- 知识类型: 限制 / 不支持的边界 -->
+
+同步范围、自动建表、一致性语义(exactly-once /
at-least-once)已在[能力对比](#能力对比)中说明。本节只列出**不支持**的约束与行为。
+
+### 主键表
+
+仅支持同步**带主键**的上游表(两种同步方式均如此)。对应的 Doris 表为 **Unique Key 表**——自动建表同步下由 Doris
自动建成,SQL 映射同步下由你自己创建。无主键的表不支持。
+
+### Schema Change(DDL)
+
+DDL 同步**仅适用于自动建表同步**;SQL 映射(TVF)不同步任何 DDL。
+
+- **PostgreSQL**(4.1 起支持):仅 `ADD COLUMN` 和 `DROP COLUMN` 会同步。**列类型变更、`RENAME
COLUMN`、约束 / 索引 / 分区变更不会同步**——需在 Doris 端手工处理。
+- **MySQL**:上游 DDL **暂不同步**——需手工调整 Doris 表结构。
## FAQ
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/streaming-job/continuous-load-postgresql-database.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/streaming-job/continuous-load-postgresql-database.md
index 6e760fc89be..f0537c74b0b 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/streaming-job/continuous-load-postgresql-database.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/streaming-job/continuous-load-postgresql-database.md
@@ -95,7 +95,7 @@ PostgreSQL 源端参数用于配置 JDBC 连接、同步范围以及全量切片
| database | - | 数据库名
|
| schema | - | Schema 名称
|
| include_tables | - | 需要同步的表名,多个表用逗号分隔,不填默认所有的表
|
-| offset | initial | initial: 全量 + 增量同步,latest: 仅增量同步
|
+| offset | latest | latest: 仅增量同步,initial: 全量 + 增量同步
|
| snapshot_split_size | 8096 | split 的大小(行数),全量同步时,表会被切分成多个 split 进行同步
|
| snapshot_parallelism | 1 | 全量阶段同步的并行度,即单次 Task 最多调度的 split 数量
|
@@ -119,8 +119,8 @@ PostgreSQL 源端参数用于配置 JDBC 连接、同步范围以及全量切片
```sql
CREATE JOB <job_name>
-ON STREAMING
[job_properties]
+ON STREAMING
[ COMMENT <comment> ]
FROM POSTGRES (
[source_properties]
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/streaming-job/continuous-load-postgresql-table.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/streaming-job/continuous-load-postgresql-table.md
index bab53b7467a..deff0043b52 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/streaming-job/continuous-load-postgresql-table.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/streaming-job/continuous-load-postgresql-table.md
@@ -102,7 +102,7 @@ select * from jobs("type"="insert") where ExecuteType =
"STREAMING";
| database | - | 数据库名
|
| schema | - | Schema 名称
|
| table | - | 需要同步的表名
|
-| offset | initial | `initial`:全量 + 增量同步;`latest`:仅增量同步
|
+| offset | latest | `latest`:仅增量同步;`initial`:全量 + 增量同步
|
| snapshot_split_size | 8096 | split 的大小(行数)。全量同步时,表会被切分成多个 split 进行同步
|
| snapshot_parallelism | 1 | 全量阶段同步的并行度,即单次 Task 最多调度的 split 数量
|
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/streaming-job/continuous-load-s3.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/streaming-job/continuous-load-s3.md
index 8c7e8d09342..5a5364ef9b2 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/streaming-job/continuous-load-s3.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/streaming-job/continuous-load-s3.md
@@ -155,8 +155,8 @@ select * from S3(
```SQL
CREATE JOB <job_name>
-ON STREAMING
[job_properties]
+ON STREAMING
[ COMMENT <comment> ]
DO <Insert_Command>
```
@@ -178,8 +178,8 @@ DO <Insert_Command>
| ------------------ | ------ |
-------------------------------------------------------------------------------------------------------------------------------------------------------------
|
| session.\* | 无 | 支持在 `job_properties` 上配置所有的 session 变量,导入变量可参考
[Insert Into
Select](../../../../data-operate/import/import-way/insert-into-manual.md#导入配置参数)
|
| s3.max_batch_files | 256 | 当累计文件数达到该值时触发一次导入写入
|
-| s3.max_batch_bytes | 10G | 当累计数据量达到该值时触发一次导入写入
|
-| max_interval | 10s | 当上游没有新增文件或数据时,空闲的调度间隔
|
+| s3.max_batch_bytes | 10737418240 | 当累计数据量(字节)达到该值时触发一次导入写入。只接受纯整数字节数,如
`10737418240`;不支持带单位后缀(如 `10G`)。取值范围 100MB–10GB,即 104857600–10737418240
|
+| max_interval | 10 | 当上游没有新增文件或数据时,空闲的调度间隔,单位为秒。只接受整数(秒数),如
`10`;不支持带单位后缀(如 `10s`)。取值需 >= 1
|
:::tip 批次触发规则
`s3.max_batch_files` 与 `s3.max_batch_bytes` 满足任一条件即触发写入;当上游无新文件时,按
`max_interval` 进入空闲轮询。
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/streaming-job/data-type-mapping.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/streaming-job/data-type-mapping.md
new file mode 100644
index 00000000000..9a047267508
--- /dev/null
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/streaming-job/data-type-mapping.md
@@ -0,0 +1,71 @@
+---
+{
+ "title": "数据类型映射",
+ "language": "zh-CN",
+ "description": "Doris Streaming Job CDC 的数据类型映射:自动建表时上游 MySQL / PostgreSQL
类型如何映射为 Doris 列类型,以及写入 Doris 时值如何转换。"
+}
+---
+
+<!-- 知识类型: 参考 / 数据类型映射 -->
+
+下表展示上游 MySQL / PostgreSQL 的每个列类型如何映射为 Doris
列类型。**只有一套映射**,两种同步方式共用,区别只在于由谁创建目标表:
+
+- **自动建表同步**:Doris 读取上游表结构,自动按这些 Doris 列类型创建目标表。
+- **SQL 映射同步**:你自己创建目标表,每列使用兼容的 Doris 类型。
+
+两种方式下,写入的值都遵循同一套映射。不支持的类型请参见[使用限制](./continuous-load-overview.md#使用限制)。
+
+## MySQL 到 Doris
+
+| MySQL 类型 | Doris 类型 | 备注 |
+| --- | --- | --- |
+| `BOOLEAN` / `TINYINT(1)` | `BOOLEAN` | |
+| `TINYINT` | `TINYINT` | `UNSIGNED` → `SMALLINT` |
+| `SMALLINT` | `SMALLINT` | `UNSIGNED` → `INT` |
+| `MEDIUMINT` | `INT` | `UNSIGNED` → `INT` |
+| `INT` | `INT` | `UNSIGNED` → `BIGINT` |
+| `BIGINT` | `BIGINT` | `UNSIGNED` → `LARGEINT` |
+| `YEAR` | `SMALLINT` | |
+| `FLOAT` | `FLOAT` | |
+| `DOUBLE` | `DOUBLE` | |
+| `DECIMAL(p,s)` | `DECIMAL(p,s)` | 超高精度回退为 `STRING` |
+| `DATE` | `DATE` | |
+| `DATETIME` | `DATETIME(s)` | 保留微秒精度(0–6) |
+| `TIMESTAMP` | `DATETIME(s)` | 按 server 时区归一 |
+| `TIME` | `STRING` | 以字符串存储,如 `12:34:56.000000` |
+| `CHAR` | `CHAR` | |
+| `VARCHAR` | `VARCHAR` | |
+| `TINYTEXT` / `TEXT` / `MEDIUMTEXT` / `LONGTEXT` | `STRING` | |
+| `BINARY` / `VARBINARY` / `TINYBLOB` / `BLOB` / `MEDIUMBLOB` / `LONGBLOB` |
`STRING` | Base64 编码字符串 |
+| `BIT(1)` | `BOOLEAN` | |
+| `BIT(n>1)` | `STRING` | Base64 编码字符串 |
+| `JSON` | `STRING` | 以 JSON 文本存储 |
+| `ENUM` | `STRING` | 解析为标签值 |
+| `SET` | `STRING` | 解析为逗号分隔的标签 |
+| 空间类型(`GEOMETRY`、`POINT` 等)及其他类型 | 不支持 | 含此类列的表自动建表会失败;可改用 SQL 映射同步(在
`SELECT` 中转换)或排除该列 |
+
+## PostgreSQL 到 Doris
+
+| PostgreSQL 类型 | Doris 类型 | 备注 |
+| --- | --- | --- |
+| `bool` | `BOOLEAN` | |
+| `bit(1)` | `BOOLEAN` | |
+| `bit(n>1)` / `varbit` | `STRING` | |
+| `int2` / `smallserial` | `SMALLINT` | |
+| `int4` / `serial` | `INT` | |
+| `int8` / `bigserial` | `BIGINT` | |
+| `float4` | `FLOAT` | |
+| `float8` | `DOUBLE` | |
+| `numeric(p,s)` | `DECIMAL(min(p,38), s)` | 精度上限 38;未指定 scale 时默认 9 |
+| `bpchar(n)`(即 `CHAR(n)`) | `CHAR` / `VARCHAR` | 长度 ×3 以兼容 UTF-8;`> 255` 转为
`VARCHAR` |
+| `varchar` / `text` | `STRING` | 注意:PostgreSQL `varchar` 映射为 `STRING`(不带长度) |
+| `date` | `DATE` | |
+| `timestamp` / `timestamptz` | `DATETIME(s)` | 精度 0–6;`timestamptz` 按 server
时区归一 |
+| `time` / `timetz` / `interval` | `STRING` | `timetz` 保留其 UTC 归一后的偏移量 |
+| `uuid` / `inet` / `cidr` / `macaddr` / `macaddr8` | `STRING` | |
+| `bytea` | `STRING` | Base64 编码字符串 |
+| `xml` / `hstore` | `STRING` | |
+| `json` / `jsonb` | `JSON` | |
+| 几何类型(`point`、`line`、`polygon` 等) | `STRING` | 序列化为 GeoJSON 字符串(`type` /
`coordinates` / `srid`) |
+| 数组类型(如 `int4[]`、`text[]`) | `ARRAY<T>` | 仅支持一维 |
+| 其他 / 用户自定义类型 | `STRING` | |
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/table-valued-functions/cdc-stream.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/table-valued-functions/cdc-stream.md
index 5cd8fb4de0f..2fdab53329d 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/table-valued-functions/cdc-stream.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/table-valued-functions/cdc-stream.md
@@ -53,6 +53,9 @@ cdc_stream(
| 参数 | 默认值 | 描述
|
|----------------------|---------|--------------------------------------------------------------|
| `schema` | - | Schema 名称,PostgreSQL 必填
|
+| `offset` | latest | `latest`:仅增量;`initial`:全量 + 增量;另支持
`snapshot`、`earliest`(仅 MySQL)、或一个 JSON 形式的 binlog/LSN 位点 |
+| `snapshot_split_size` | 8096 | split 的大小(行数),全量同步时表会被切分成多个 split |
+| `snapshot_parallelism` | 1 | 全量阶段的并行度,即单次 Task 最多调度的 split 数量 |
## 注意事项
@@ -80,6 +83,7 @@ cdc_stream(
"password" = "123456",
"database" = "source_db",
"table" = "source_table",
+ "offset" = "initial"
)
```
@@ -100,6 +104,7 @@ cdc_stream(
"database" = "postgres",
"schema" = "public",
"table" = "source_table",
+ "offset" = "initial"
)
```
@@ -120,5 +125,6 @@ cdc_stream(
"password" = "123456",
"database" = "source_db",
"table" = "source_table",
+ "offset" = "initial"
)
```
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-statements/job/CREATE-STREAMING-JOB.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-statements/job/CREATE-STREAMING-JOB.md
index d4968b71968..128539f20b5 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-statements/job/CREATE-STREAMING-JOB.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-statements/job/CREATE-STREAMING-JOB.md
@@ -23,12 +23,12 @@ Doris Streaming Job 是基于 Job 的方式,创建一个持续导入任务。
```SQL
CREATE JOB <job_name>
-ON STREAMING
[ PROPERTIES (
<job_property>
[ , ... ]
)
]
+ON STREAMING
[ COMMENT <comment> ]
DO <Insert_Command>
```
@@ -37,12 +37,12 @@ DO <Insert_Command>
```SQL
CREATE JOB <job_name>
-ON STREAMING
[ PROPERTIES (
<job_property>
[ , ... ]
)
]
+ON STREAMING
[ COMMENT <comment> ]
FROM <sourceType> (
<source_property>
@@ -77,7 +77,7 @@ TO DATABASE <target_db> (
| database | - | 数据库名
|
| schema | - | Schema 名称(PostgreSQL)
|
| include_tables | - | 需要同步的表名,多个表用逗号分隔,不填默认所有的表 |
-| offset | initial | initial: 全量 + 增量同步,latest: 仅增量同步
|
+| offset | latest | `latest`: 仅增量同步(默认);`initial`: 全量 + 增量同步
|
| snapshot_split_size | 8096 | split 的大小(行数),全量同步时,表会被切分成多个 split 进行同步 |
| snapshot_parallelism | 1 | 全量阶段同步的并行度,即单次 Task 最多调度的 split 数量 |
@@ -100,8 +100,8 @@ TO DATABASE <target_db> (
| ------------------ | ------ |
------------------------------------------------------------ |
| session.* | 无 | 支持在 job_properties 上配置所有的 session 变量(仅 TVF 模式) |
| s3.max_batch_files | 256 | 当累计文件数达到该值时触发一次导入写入(仅 S3 TVF) |
-| s3.max_batch_bytes | 10G | 当累计数据量达到该值时触发一次导入写入(仅 S3 TVF) |
-| max_interval | 10s | 当上游没有新增文件或数据时,空闲的调度间隔 |
+| s3.max_batch_bytes | 10737418240 | 当累计数据量(字节)达到该值时触发一次导入写入。只接受纯整数字节数,如
`10737418240`;不支持带单位后缀(如 `10G`)。取值范围 100MB–10GB,即 104857600–10737418240。(仅 S3
TVF) |
+| max_interval | 10 | 当上游没有新增文件或数据时,空闲的调度间隔,单位为秒。只接受整数(秒数),如
`10`;不支持带单位后缀(如 `10s`)。取值需 >= 1 |
## 权限控制
@@ -121,11 +121,11 @@ TO DATABASE <target_db> (
```sql
CREATE JOB my_job
-ON STREAMING
PROPERTIES (
- "max_interval" = "30s",
+ "max_interval" = "30",
"session.insert_max_filter_ratio" = "0.5"
)
+ON STREAMING
DO
INSERT INTO db1.tbl1
SELECT * FROM S3(
@@ -177,7 +177,8 @@ SELECT * FROM S3(
"user" = "root",
"password" = "123456",
"database" = "source_db",
- "table" = "source_table"
+ "table" = "source_table",
+ "offset" = "initial"
);
```
@@ -197,7 +198,8 @@ SELECT * FROM S3(
"password" = "postgres",
"database" = "postgres",
"schema" = "public",
- "table" = "source_table"
+ "table" = "source_table",
+ "offset" = "initial"
);
```
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-mysql-database.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-mysql-database.md
index 4c4a04e51d7..1071d5991ca 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-mysql-database.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-mysql-database.md
@@ -144,7 +144,7 @@ MySQL 源端 (`FROM MYSQL`) 支持的参数如下:
| password | - | 数据库密码
|
| database | - | 数据库名
|
| include_tables | - | 需要同步的表名,多个表用逗号分隔,不填默认所有的表
|
-| offset | initial | `initial`:全量 + 增量同步;`latest`:仅增量同步
|
+| offset | latest | `latest`:仅增量同步;`initial`:全量 + 增量同步
|
| snapshot_split_size | 8096 | split 的大小(行数),全量同步时,表会被切分成多个 split 进行同步
|
| snapshot_parallelism | 1 | 全量阶段同步的并行度,即单次 Task 最多调度的 split 数量
|
@@ -158,8 +158,8 @@ MySQL 源端 (`FROM MYSQL`) 支持的参数如下:
```sql
CREATE JOB <job_name>
-ON STREAMING
[job_properties]
+ON STREAMING
[ COMMENT <comment> ]
FROM MYSQL (
[source_properties]
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-mysql-table.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-mysql-table.md
index 7807d76f337..d4c7abb22c9 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-mysql-table.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-mysql-table.md
@@ -92,7 +92,7 @@ CDC Stream TVF 支持的 MySQL 数据源参数如下:
| password | - | 数据库密码
|
| database | - | 数据库名
|
| table | - | 需要同步的表名
|
-| offset | initial | `initial`:全量 + 增量同步;`latest`:仅增量同步
|
+| offset | latest | `latest`:仅增量同步;`initial`:全量 + 增量同步
|
| snapshot_split_size | 8096 | split 的大小(行数)。全量同步时,表会被切分成多个 split 进行同步
|
| snapshot_parallelism | 1 | 全量阶段同步的并行度,即单次 Task 最多调度的 split 数量
|
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-overview.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-overview.md
index 4943a2633f7..a22e3eb02b8 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-overview.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-overview.md
@@ -46,6 +46,8 @@ Doris 支持通过 **Streaming Job** 的方式,从多种数据源持续导入
| PostgreSQL | 14、15、16、17 | [PostgreSQL CDC SQL
映射同步](./continuous-load-postgresql-table.md) | [PostgreSQL CDC
自动建表同步](./continuous-load-postgresql-database.md) | [Amazon RDS
PostgreSQL](./prerequisites/amazon-rds-postgresql.md) · [Amazon Aurora
PostgreSQL](./prerequisites/amazon-aurora-postgresql.md) |
| S3 | - | [S3 持续导入](./continuous-load-s3.md)
| -
| -
|
+上游列类型如何映射为 Doris 类型,见[数据类型映射](./data-type-mapping.md)。
+
## 如何选择同步方式
<!-- 知识类型: 架构选型决策 -->
@@ -197,7 +199,24 @@ DROP JOB WHERE jobName = <job_name>;
| 参数 | 默认值 | 说明 |
| ------------ | ------ | -------------------------------------- |
-| max_interval | 10s | 当上游没有新增数据时,空闲的调度间隔。 |
+| max_interval | 10 | 当上游没有新增数据时,空闲的调度间隔,单位为秒。只接受整数(秒数),如 `10`;不支持带单位后缀(如
`10s`)。取值需 >= 1。 |
+
+## 使用限制
+
+<!-- 知识类型: 限制 / 不支持的边界 -->
+
+同步范围、自动建表、一致性语义(exactly-once /
at-least-once)已在[能力对比](#能力对比)中说明。本节只列出**不支持**的约束与行为。
+
+### 主键表
+
+仅支持同步**带主键**的上游表(两种同步方式均如此)。对应的 Doris 表为 **Unique Key 表**——自动建表同步下由 Doris
自动建成,SQL 映射同步下由你自己创建。无主键的表不支持。
+
+### Schema Change(DDL)
+
+DDL 同步**仅适用于自动建表同步**;SQL 映射(TVF)不同步任何 DDL。
+
+- **PostgreSQL**(4.1 起支持):仅 `ADD COLUMN` 和 `DROP COLUMN` 会同步。**列类型变更、`RENAME
COLUMN`、约束 / 索引 / 分区变更不会同步**——需在 Doris 端手工处理。
+- **MySQL**:上游 DDL **暂不同步**——需手工调整 Doris 表结构。
## FAQ
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-postgresql-database.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-postgresql-database.md
index 6e760fc89be..f0537c74b0b 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-postgresql-database.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-postgresql-database.md
@@ -95,7 +95,7 @@ PostgreSQL 源端参数用于配置 JDBC 连接、同步范围以及全量切片
| database | - | 数据库名
|
| schema | - | Schema 名称
|
| include_tables | - | 需要同步的表名,多个表用逗号分隔,不填默认所有的表
|
-| offset | initial | initial: 全量 + 增量同步,latest: 仅增量同步
|
+| offset | latest | latest: 仅增量同步,initial: 全量 + 增量同步
|
| snapshot_split_size | 8096 | split 的大小(行数),全量同步时,表会被切分成多个 split 进行同步
|
| snapshot_parallelism | 1 | 全量阶段同步的并行度,即单次 Task 最多调度的 split 数量
|
@@ -119,8 +119,8 @@ PostgreSQL 源端参数用于配置 JDBC 连接、同步范围以及全量切片
```sql
CREATE JOB <job_name>
-ON STREAMING
[job_properties]
+ON STREAMING
[ COMMENT <comment> ]
FROM POSTGRES (
[source_properties]
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-postgresql-table.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-postgresql-table.md
index bab53b7467a..deff0043b52 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-postgresql-table.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-postgresql-table.md
@@ -102,7 +102,7 @@ select * from jobs("type"="insert") where ExecuteType =
"STREAMING";
| database | - | 数据库名
|
| schema | - | Schema 名称
|
| table | - | 需要同步的表名
|
-| offset | initial | `initial`:全量 + 增量同步;`latest`:仅增量同步
|
+| offset | latest | `latest`:仅增量同步;`initial`:全量 + 增量同步
|
| snapshot_split_size | 8096 | split 的大小(行数)。全量同步时,表会被切分成多个 split 进行同步
|
| snapshot_parallelism | 1 | 全量阶段同步的并行度,即单次 Task 最多调度的 split 数量
|
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-s3.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-s3.md
index 8c7e8d09342..5a5364ef9b2 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-s3.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-s3.md
@@ -155,8 +155,8 @@ select * from S3(
```SQL
CREATE JOB <job_name>
-ON STREAMING
[job_properties]
+ON STREAMING
[ COMMENT <comment> ]
DO <Insert_Command>
```
@@ -178,8 +178,8 @@ DO <Insert_Command>
| ------------------ | ------ |
-------------------------------------------------------------------------------------------------------------------------------------------------------------
|
| session.\* | 无 | 支持在 `job_properties` 上配置所有的 session 变量,导入变量可参考
[Insert Into
Select](../../../../data-operate/import/import-way/insert-into-manual.md#导入配置参数)
|
| s3.max_batch_files | 256 | 当累计文件数达到该值时触发一次导入写入
|
-| s3.max_batch_bytes | 10G | 当累计数据量达到该值时触发一次导入写入
|
-| max_interval | 10s | 当上游没有新增文件或数据时,空闲的调度间隔
|
+| s3.max_batch_bytes | 10737418240 | 当累计数据量(字节)达到该值时触发一次导入写入。只接受纯整数字节数,如
`10737418240`;不支持带单位后缀(如 `10G`)。取值范围 100MB–10GB,即 104857600–10737418240
|
+| max_interval | 10 | 当上游没有新增文件或数据时,空闲的调度间隔,单位为秒。只接受整数(秒数),如
`10`;不支持带单位后缀(如 `10s`)。取值需 >= 1
|
:::tip 批次触发规则
`s3.max_batch_files` 与 `s3.max_batch_bytes` 满足任一条件即触发写入;当上游无新文件时,按
`max_interval` 进入空闲轮询。
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/import-way/streaming-job/data-type-mapping.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/import-way/streaming-job/data-type-mapping.md
new file mode 100644
index 00000000000..9a047267508
--- /dev/null
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/import-way/streaming-job/data-type-mapping.md
@@ -0,0 +1,71 @@
+---
+{
+ "title": "数据类型映射",
+ "language": "zh-CN",
+ "description": "Doris Streaming Job CDC 的数据类型映射:自动建表时上游 MySQL / PostgreSQL
类型如何映射为 Doris 列类型,以及写入 Doris 时值如何转换。"
+}
+---
+
+<!-- 知识类型: 参考 / 数据类型映射 -->
+
+下表展示上游 MySQL / PostgreSQL 的每个列类型如何映射为 Doris
列类型。**只有一套映射**,两种同步方式共用,区别只在于由谁创建目标表:
+
+- **自动建表同步**:Doris 读取上游表结构,自动按这些 Doris 列类型创建目标表。
+- **SQL 映射同步**:你自己创建目标表,每列使用兼容的 Doris 类型。
+
+两种方式下,写入的值都遵循同一套映射。不支持的类型请参见[使用限制](./continuous-load-overview.md#使用限制)。
+
+## MySQL 到 Doris
+
+| MySQL 类型 | Doris 类型 | 备注 |
+| --- | --- | --- |
+| `BOOLEAN` / `TINYINT(1)` | `BOOLEAN` | |
+| `TINYINT` | `TINYINT` | `UNSIGNED` → `SMALLINT` |
+| `SMALLINT` | `SMALLINT` | `UNSIGNED` → `INT` |
+| `MEDIUMINT` | `INT` | `UNSIGNED` → `INT` |
+| `INT` | `INT` | `UNSIGNED` → `BIGINT` |
+| `BIGINT` | `BIGINT` | `UNSIGNED` → `LARGEINT` |
+| `YEAR` | `SMALLINT` | |
+| `FLOAT` | `FLOAT` | |
+| `DOUBLE` | `DOUBLE` | |
+| `DECIMAL(p,s)` | `DECIMAL(p,s)` | 超高精度回退为 `STRING` |
+| `DATE` | `DATE` | |
+| `DATETIME` | `DATETIME(s)` | 保留微秒精度(0–6) |
+| `TIMESTAMP` | `DATETIME(s)` | 按 server 时区归一 |
+| `TIME` | `STRING` | 以字符串存储,如 `12:34:56.000000` |
+| `CHAR` | `CHAR` | |
+| `VARCHAR` | `VARCHAR` | |
+| `TINYTEXT` / `TEXT` / `MEDIUMTEXT` / `LONGTEXT` | `STRING` | |
+| `BINARY` / `VARBINARY` / `TINYBLOB` / `BLOB` / `MEDIUMBLOB` / `LONGBLOB` |
`STRING` | Base64 编码字符串 |
+| `BIT(1)` | `BOOLEAN` | |
+| `BIT(n>1)` | `STRING` | Base64 编码字符串 |
+| `JSON` | `STRING` | 以 JSON 文本存储 |
+| `ENUM` | `STRING` | 解析为标签值 |
+| `SET` | `STRING` | 解析为逗号分隔的标签 |
+| 空间类型(`GEOMETRY`、`POINT` 等)及其他类型 | 不支持 | 含此类列的表自动建表会失败;可改用 SQL 映射同步(在
`SELECT` 中转换)或排除该列 |
+
+## PostgreSQL 到 Doris
+
+| PostgreSQL 类型 | Doris 类型 | 备注 |
+| --- | --- | --- |
+| `bool` | `BOOLEAN` | |
+| `bit(1)` | `BOOLEAN` | |
+| `bit(n>1)` / `varbit` | `STRING` | |
+| `int2` / `smallserial` | `SMALLINT` | |
+| `int4` / `serial` | `INT` | |
+| `int8` / `bigserial` | `BIGINT` | |
+| `float4` | `FLOAT` | |
+| `float8` | `DOUBLE` | |
+| `numeric(p,s)` | `DECIMAL(min(p,38), s)` | 精度上限 38;未指定 scale 时默认 9 |
+| `bpchar(n)`(即 `CHAR(n)`) | `CHAR` / `VARCHAR` | 长度 ×3 以兼容 UTF-8;`> 255` 转为
`VARCHAR` |
+| `varchar` / `text` | `STRING` | 注意:PostgreSQL `varchar` 映射为 `STRING`(不带长度) |
+| `date` | `DATE` | |
+| `timestamp` / `timestamptz` | `DATETIME(s)` | 精度 0–6;`timestamptz` 按 server
时区归一 |
+| `time` / `timetz` / `interval` | `STRING` | `timetz` 保留其 UTC 归一后的偏移量 |
+| `uuid` / `inet` / `cidr` / `macaddr` / `macaddr8` | `STRING` | |
+| `bytea` | `STRING` | Base64 编码字符串 |
+| `xml` / `hstore` | `STRING` | |
+| `json` / `jsonb` | `JSON` | |
+| 几何类型(`point`、`line`、`polygon` 等) | `STRING` | 序列化为 GeoJSON 字符串(`type` /
`coordinates` / `srid`) |
+| 数组类型(如 `int4[]`、`text[]`) | `ARRAY<T>` | 仅支持一维 |
+| 其他 / 用户自定义类型 | `STRING` | |
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/sql-functions/table-valued-functions/cdc-stream.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/sql-functions/table-valued-functions/cdc-stream.md
index 5cd8fb4de0f..2fdab53329d 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/sql-functions/table-valued-functions/cdc-stream.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/sql-functions/table-valued-functions/cdc-stream.md
@@ -53,6 +53,9 @@ cdc_stream(
| 参数 | 默认值 | 描述
|
|----------------------|---------|--------------------------------------------------------------|
| `schema` | - | Schema 名称,PostgreSQL 必填
|
+| `offset` | latest | `latest`:仅增量;`initial`:全量 + 增量;另支持
`snapshot`、`earliest`(仅 MySQL)、或一个 JSON 形式的 binlog/LSN 位点 |
+| `snapshot_split_size` | 8096 | split 的大小(行数),全量同步时表会被切分成多个 split |
+| `snapshot_parallelism` | 1 | 全量阶段的并行度,即单次 Task 最多调度的 split 数量 |
## 注意事项
@@ -80,6 +83,7 @@ cdc_stream(
"password" = "123456",
"database" = "source_db",
"table" = "source_table",
+ "offset" = "initial"
)
```
@@ -100,6 +104,7 @@ cdc_stream(
"database" = "postgres",
"schema" = "public",
"table" = "source_table",
+ "offset" = "initial"
)
```
@@ -120,5 +125,6 @@ cdc_stream(
"password" = "123456",
"database" = "source_db",
"table" = "source_table",
+ "offset" = "initial"
)
```
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/sql-statements/job/CREATE-STREAMING-JOB.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/sql-statements/job/CREATE-STREAMING-JOB.md
index d4968b71968..128539f20b5 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/sql-statements/job/CREATE-STREAMING-JOB.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/sql-statements/job/CREATE-STREAMING-JOB.md
@@ -23,12 +23,12 @@ Doris Streaming Job 是基于 Job 的方式,创建一个持续导入任务。
```SQL
CREATE JOB <job_name>
-ON STREAMING
[ PROPERTIES (
<job_property>
[ , ... ]
)
]
+ON STREAMING
[ COMMENT <comment> ]
DO <Insert_Command>
```
@@ -37,12 +37,12 @@ DO <Insert_Command>
```SQL
CREATE JOB <job_name>
-ON STREAMING
[ PROPERTIES (
<job_property>
[ , ... ]
)
]
+ON STREAMING
[ COMMENT <comment> ]
FROM <sourceType> (
<source_property>
@@ -77,7 +77,7 @@ TO DATABASE <target_db> (
| database | - | 数据库名
|
| schema | - | Schema 名称(PostgreSQL)
|
| include_tables | - | 需要同步的表名,多个表用逗号分隔,不填默认所有的表 |
-| offset | initial | initial: 全量 + 增量同步,latest: 仅增量同步
|
+| offset | latest | `latest`: 仅增量同步(默认);`initial`: 全量 + 增量同步
|
| snapshot_split_size | 8096 | split 的大小(行数),全量同步时,表会被切分成多个 split 进行同步 |
| snapshot_parallelism | 1 | 全量阶段同步的并行度,即单次 Task 最多调度的 split 数量 |
@@ -100,8 +100,8 @@ TO DATABASE <target_db> (
| ------------------ | ------ |
------------------------------------------------------------ |
| session.* | 无 | 支持在 job_properties 上配置所有的 session 变量(仅 TVF 模式) |
| s3.max_batch_files | 256 | 当累计文件数达到该值时触发一次导入写入(仅 S3 TVF) |
-| s3.max_batch_bytes | 10G | 当累计数据量达到该值时触发一次导入写入(仅 S3 TVF) |
-| max_interval | 10s | 当上游没有新增文件或数据时,空闲的调度间隔 |
+| s3.max_batch_bytes | 10737418240 | 当累计数据量(字节)达到该值时触发一次导入写入。只接受纯整数字节数,如
`10737418240`;不支持带单位后缀(如 `10G`)。取值范围 100MB–10GB,即 104857600–10737418240。(仅 S3
TVF) |
+| max_interval | 10 | 当上游没有新增文件或数据时,空闲的调度间隔,单位为秒。只接受整数(秒数),如
`10`;不支持带单位后缀(如 `10s`)。取值需 >= 1 |
## 权限控制
@@ -121,11 +121,11 @@ TO DATABASE <target_db> (
```sql
CREATE JOB my_job
-ON STREAMING
PROPERTIES (
- "max_interval" = "30s",
+ "max_interval" = "30",
"session.insert_max_filter_ratio" = "0.5"
)
+ON STREAMING
DO
INSERT INTO db1.tbl1
SELECT * FROM S3(
@@ -177,7 +177,8 @@ SELECT * FROM S3(
"user" = "root",
"password" = "123456",
"database" = "source_db",
- "table" = "source_table"
+ "table" = "source_table",
+ "offset" = "initial"
);
```
@@ -197,7 +198,8 @@ SELECT * FROM S3(
"password" = "postgres",
"database" = "postgres",
"schema" = "public",
- "table" = "source_table"
+ "table" = "source_table",
+ "offset" = "initial"
);
```
diff --git a/sidebars.ts b/sidebars.ts
index 1e3da0c3192..9bf765b330a 100644
--- a/sidebars.ts
+++ b/sidebars.ts
@@ -358,6 +358,7 @@ const sidebars: SidebarsConfig = {
],
},
'data-operate/import/import-way/streaming-job/continuous-load-s3',
+
'data-operate/import/import-way/streaming-job/data-type-mapping',
],
},
],
diff --git
a/versioned_docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-mysql-database.md
b/versioned_docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-mysql-database.md
index c34d0660e8a..0b52d267d4f 100644
---
a/versioned_docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-mysql-database.md
+++
b/versioned_docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-mysql-database.md
@@ -144,7 +144,7 @@ The MySQL source side (`FROM MYSQL`) supports the following
parameters:
| password | - | Database password
|
| database | - | Database name
|
| include_tables | - | Names of tables to sync, separated by
commas. If left empty, all tables are synced by default |
-| offset | initial | `initial`: full + incremental sync;
`latest`: incremental sync only
|
+| offset | latest | `latest`: incremental sync only; `initial`:
full + incremental sync |
| snapshot_split_size | 8096 | Size of a split (in rows). During full
sync, a table is divided into multiple splits for sync |
| snapshot_parallelism | 1 | Parallelism during the full-sync phase,
that is, the maximum number of splits a single Task can schedule at a time |
@@ -158,8 +158,8 @@ The syntax for creating an Auto Table Creation Sync job is
as follows:
```sql
CREATE JOB <job_name>
-ON STREAMING
[job_properties]
+ON STREAMING
[ COMMENT <comment> ]
FROM MYSQL (
[source_properties]
diff --git
a/versioned_docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-mysql-table.md
b/versioned_docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-mysql-table.md
index 01f010f9552..797aead44ca 100644
---
a/versioned_docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-mysql-table.md
+++
b/versioned_docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-mysql-table.md
@@ -92,7 +92,7 @@ The MySQL data source parameters supported by CDC Stream TVF
are as follows:
| password | - | Database password
|
| database | - | Database name
|
| table | - | Name of the table to sync
|
-| offset | initial | `initial`: full + incremental sync;
`latest`: incremental sync only
|
+| offset | latest | `latest`: incremental sync only;
`initial`: full + incremental sync
|
| snapshot_split_size | 8096 | Size of a split (in rows). During full
sync, the table is divided into multiple splits for syncing
|
| snapshot_parallelism | 1 | Parallelism during the full sync stage,
that is, the maximum number of splits scheduled per Task
|
diff --git
a/versioned_docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-overview.md
b/versioned_docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-overview.md
index d82cba73319..9b03d700601 100644
---
a/versioned_docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-overview.md
+++
b/versioned_docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-overview.md
@@ -46,6 +46,8 @@ Continuous load supports the following data sources and sync
modes:
| PostgreSQL | 14, 15, 16, 17 | [PostgreSQL CDC with SQL
Mapping](./continuous-load-postgresql-table.md) | [PostgreSQL CDC with Auto
Table Creation](./continuous-load-postgresql-database.md) | [Amazon RDS
PostgreSQL](./prerequisites/amazon-rds-postgresql.md) · [Amazon Aurora
PostgreSQL](./prerequisites/amazon-aurora-postgresql.md) |
| S3 | - | [S3 Continuous
Load](./continuous-load-s3.md) | -
| -
|
+For how upstream column types map to Doris types, see [Data Type
Mapping](./data-type-mapping.md).
+
## How to Choose a Sync Method
<!-- Knowledge type: Architecture selection decision -->
@@ -197,7 +199,24 @@ DROP JOB WHERE jobName = <job_name>;
| Parameter | Default | Description
|
| ------------ | ------- |
------------------------------------------------------ |
-| max_interval | 10s | Idle scheduling interval when the upstream has no
new data. |
+| max_interval | 10 | Idle scheduling interval in seconds when the
upstream has no new data. Only an integer (number of seconds) is accepted, e.g.
`10`; a unit suffix such as `10s` is not supported. Must be >= 1. |
+
+## Limitations
+
+<!-- Knowledge type: Limitations / unsupported boundaries -->
+
+Sync scope, automatic table creation, and semantic guarantees (exactly-once /
at-least-once) are described in [Capability
Comparison](#capability-comparison). This section only lists the constraints
and behaviors that are **not supported**.
+
+### Primary Key Tables
+
+Only upstream tables **with a primary key** can be synchronized (both sync
methods). The corresponding Doris table is a **Unique Key** table —
auto-created as such in Auto Table Creation Sync, or created by you in SQL
Mapping Sync. Tables without a primary key are not supported.
+
+### Schema Change (DDL)
+
+DDL sync applies **only to Auto Table Creation Sync**; SQL Mapping (TVF) does
not sync any DDL.
+
+- **PostgreSQL** (supported since 4.1): only `ADD COLUMN` and `DROP COLUMN`
are synced. **Column type changes, `RENAME COLUMN`, and constraint / index /
partition changes are NOT synced** — apply them manually in Doris.
+- **MySQL**: upstream DDL is **not synced yet** — adjust the Doris table
schema manually.
## FAQ
diff --git
a/versioned_docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-postgresql-database.md
b/versioned_docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-postgresql-database.md
index cd81fef3613..6e597b9394d 100644
---
a/versioned_docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-postgresql-database.md
+++
b/versioned_docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-postgresql-database.md
@@ -95,7 +95,7 @@ PostgreSQL source parameters configure the JDBC connection,
sync scope, and full
| database | - | Database name
|
| schema | - | Schema name
|
| include_tables | - | Tables to sync, separated by commas. If
left empty, all tables are synced by default
|
-| offset | initial | initial: full + incremental sync; latest:
incremental-only sync
|
+| offset | latest | latest: incremental-only sync; initial:
full + incremental sync
|
| snapshot_split_size | 8096 | Size of a split (in rows). During full
sync, a table is divided into multiple splits for synchronization
|
| snapshot_parallelism | 1 | Parallelism during the full-sync phase,
that is, the maximum number of splits scheduled in a single Task
|
@@ -119,8 +119,8 @@ The syntax for creating an Auto Table Creation Sync job is
as follows:
```sql
CREATE JOB <job_name>
-ON STREAMING
[job_properties]
+ON STREAMING
[ COMMENT <comment> ]
FROM POSTGRES (
[source_properties]
diff --git
a/versioned_docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-postgresql-table.md
b/versioned_docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-postgresql-table.md
index 623679d1424..a9bc4d8042b 100644
---
a/versioned_docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-postgresql-table.md
+++
b/versioned_docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-postgresql-table.md
@@ -102,7 +102,7 @@ Data source parameters are configured through the
`cdc_stream(...)` TVF. They de
| database | - | Database name.
|
| schema | - | Schema name.
|
| table | - | Name of the table to sync.
|
-| offset | initial | `initial`: full + incremental sync.
`latest`: incremental sync only.
|
+| offset | latest | `latest`: incremental sync only. `initial`:
full + incremental sync.
|
| snapshot_split_size | 8096 | Split size in rows. During full sync, the
table is divided into multiple splits for syncing.
|
| snapshot_parallelism | 1 | Parallelism for the full sync stage, that
is, the maximum number of splits scheduled per Task.
|
diff --git
a/versioned_docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-s3.md
b/versioned_docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-s3.md
index e830f867ff7..cca869dc64c 100644
---
a/versioned_docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-s3.md
+++
b/versioned_docs/version-4.x/data-operate/import/import-way/streaming-job/continuous-load-s3.md
@@ -155,8 +155,8 @@ The syntax for creating an S3 TVF continuous load job is as
follows:
```SQL
CREATE JOB <job_name>
-ON STREAMING
[job_properties]
+ON STREAMING
[ COMMENT <comment> ]
DO <Insert_Command>
```
@@ -178,8 +178,8 @@ The following parameters can be configured in
`job_properties`:
| ------------------ | ------- |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
| session.\* | None | All session variables can be configured under
`job_properties`. For load variables, see [Insert Into
Select](../../../../data-operate/import/import-way/insert-into-manual.md#import-configuration-parameters)
|
| s3.max_batch_files | 256 | Triggers a load write when the cumulative
number of files reaches this value
|
-| s3.max_batch_bytes | 10G | Triggers a load write when the cumulative
data volume reaches this value
|
-| max_interval | 10s | The idle scheduling interval when there are
no new files or data from upstream
|
+| s3.max_batch_bytes | 10737418240 | Triggers a load write when the cumulative
data volume (in bytes) reaches this value. Only a plain integer byte count is
accepted, e.g. `10737418240`; a unit suffix such as `10G` is not supported.
Valid range is 100MB–10GB, i.e. 104857600–10737418240
|
+| max_interval | 10 | The idle scheduling interval in seconds when
there are no new files or data from upstream. Only an integer (number of
seconds) is accepted, e.g. `10`; a unit suffix such as `10s` is not supported.
Must be >= 1 |
:::tip Batch trigger rule
A write is triggered when either `s3.max_batch_files` or `s3.max_batch_bytes`
is satisfied. When there are no new files from upstream, idle polling proceeds
according to `max_interval`.
diff --git
a/versioned_docs/version-4.x/data-operate/import/import-way/streaming-job/data-type-mapping.md
b/versioned_docs/version-4.x/data-operate/import/import-way/streaming-job/data-type-mapping.md
new file mode 100644
index 00000000000..a1a6ee06daa
--- /dev/null
+++
b/versioned_docs/version-4.x/data-operate/import/import-way/streaming-job/data-type-mapping.md
@@ -0,0 +1,71 @@
+---
+{
+ "title": "Data Type Mapping",
+ "language": "en",
+ "description": "Data type mapping for Doris Streaming Job CDC: how
upstream MySQL and PostgreSQL types map to Doris column types during automatic
table creation, and how values are converted when written into Doris."
+}
+---
+
+<!-- Knowledge type: Reference / data type mapping -->
+
+The tables below show how each upstream MySQL or PostgreSQL column type maps
to a Doris column type. There is a single mapping, shared by both sync methods
— the only difference is who creates the target table:
+
+- **Auto Table Creation Sync**: Doris reads the upstream schema and creates
the target table with these Doris column types automatically.
+- **SQL Mapping Sync**: you create the target table yourself, using a
compatible Doris type for each column.
+
+In both cases, the values are written following the same mapping. For types
that are not supported, see
[Limitations](./continuous-load-overview.md#limitations).
+
+## MySQL to Doris
+
+| MySQL Type | Doris Type | Notes |
+| --- | --- | --- |
+| `BOOLEAN` / `TINYINT(1)` | `BOOLEAN` | |
+| `TINYINT` | `TINYINT` | `UNSIGNED` → `SMALLINT` |
+| `SMALLINT` | `SMALLINT` | `UNSIGNED` → `INT` |
+| `MEDIUMINT` | `INT` | `UNSIGNED` → `INT` |
+| `INT` | `INT` | `UNSIGNED` → `BIGINT` |
+| `BIGINT` | `BIGINT` | `UNSIGNED` → `LARGEINT` |
+| `YEAR` | `SMALLINT` | |
+| `FLOAT` | `FLOAT` | |
+| `DOUBLE` | `DOUBLE` | |
+| `DECIMAL(p,s)` | `DECIMAL(p,s)` | Very high precision falls back to `STRING`
|
+| `DATE` | `DATE` | |
+| `DATETIME` | `DATETIME(s)` | Microsecond scale (0–6) preserved |
+| `TIMESTAMP` | `DATETIME(s)` | Normalized to the server time zone |
+| `TIME` | `STRING` | Serialized as a string, e.g. `12:34:56.000000` |
+| `CHAR` | `CHAR` | |
+| `VARCHAR` | `VARCHAR` | |
+| `TINYTEXT` / `TEXT` / `MEDIUMTEXT` / `LONGTEXT` | `STRING` | |
+| `BINARY` / `VARBINARY` / `TINYBLOB` / `BLOB` / `MEDIUMBLOB` / `LONGBLOB` |
`STRING` | Base64-encoded string |
+| `BIT(1)` | `BOOLEAN` | |
+| `BIT(n>1)` | `STRING` | Base64-encoded string |
+| `JSON` | `STRING` | Stored as JSON text |
+| `ENUM` | `STRING` | Resolved to the label value |
+| `SET` | `STRING` | Resolved to comma-separated labels |
+| Spatial types (`GEOMETRY`, `POINT`, …) and other types | Not supported |
Auto Table Creation fails for tables with such columns; use SQL Mapping Sync
(cast in `SELECT`) or exclude the column |
+
+## PostgreSQL to Doris
+
+| PostgreSQL Type | Doris Type | Notes |
+| --- | --- | --- |
+| `bool` | `BOOLEAN` | |
+| `bit(1)` | `BOOLEAN` | |
+| `bit(n>1)` / `varbit` | `STRING` | |
+| `int2` / `smallserial` | `SMALLINT` | |
+| `int4` / `serial` | `INT` | |
+| `int8` / `bigserial` | `BIGINT` | |
+| `float4` | `FLOAT` | |
+| `float8` | `DOUBLE` | |
+| `numeric(p,s)` | `DECIMAL(min(p,38), s)` | Precision capped at 38; scale
defaults to 9 when unspecified |
+| `bpchar(n)` (i.e. `CHAR(n)`) | `CHAR` / `VARCHAR` | Length scaled ×3 for
UTF-8; `> 255` becomes `VARCHAR` |
+| `varchar` / `text` | `STRING` | PostgreSQL `varchar` maps to `STRING` (no
length) |
+| `date` | `DATE` | |
+| `timestamp` / `timestamptz` | `DATETIME(s)` | Scale 0–6; `timestamptz`
normalized to the server time zone |
+| `time` / `timetz` / `interval` | `STRING` | `timetz` keeps its
UTC-normalized offset |
+| `uuid` / `inet` / `cidr` / `macaddr` / `macaddr8` | `STRING` | |
+| `bytea` | `STRING` | Base64-encoded string |
+| `xml` / `hstore` | `STRING` | |
+| `json` / `jsonb` | `JSON` | |
+| Geometry types (`point`, `line`, `polygon`, …) | `STRING` | Serialized as a
GeoJSON string (`type` / `coordinates` / `srid`) |
+| Array types (e.g. `int4[]`, `text[]`) | `ARRAY<T>` | One-dimensional only |
+| Other / user-defined types | `STRING` | |
diff --git
a/versioned_docs/version-4.x/sql-manual/sql-functions/table-valued-functions/cdc-stream.md
b/versioned_docs/version-4.x/sql-manual/sql-functions/table-valued-functions/cdc-stream.md
index e96fa2521da..41741eb945c 100644
---
a/versioned_docs/version-4.x/sql-manual/sql-functions/table-valued-functions/cdc-stream.md
+++
b/versioned_docs/version-4.x/sql-manual/sql-functions/table-valued-functions/cdc-stream.md
@@ -49,7 +49,7 @@ cdc_stream(
| Parameter | Default | Description
|
|------------------------|---------|--------------------------------------------------------------|
| `schema` | - | Schema name, required for PostgreSQL
|
-| `offset` | initial | `initial`: full + incremental sync;
`latest`: incremental only |
+| `offset` | latest | `latest`: incremental only; `initial`:
full + incremental; also supports `snapshot`, `earliest` (MySQL only), or a
JSON binlog/LSN position |
| `snapshot_split_size` | 8096 | Split size (in rows). During full sync,
the table is divided into multiple splits |
| `snapshot_parallelism` | 1 | Parallelism during full sync phase, i.e.,
max splits per task |
diff --git
a/versioned_docs/version-4.x/sql-manual/sql-statements/job/CREATE-STREAMING-JOB.md
b/versioned_docs/version-4.x/sql-manual/sql-statements/job/CREATE-STREAMING-JOB.md
index 5fa42fd6fbd..dc5b097ccdf 100644
---
a/versioned_docs/version-4.x/sql-manual/sql-statements/job/CREATE-STREAMING-JOB.md
+++
b/versioned_docs/version-4.x/sql-manual/sql-statements/job/CREATE-STREAMING-JOB.md
@@ -23,12 +23,12 @@ For detailed usage, see [Continuous
Load](../../../data-operate/import/import-wa
```SQL
CREATE JOB <job_name>
-ON STREAMING
[ PROPERTIES (
<job_property>
[ , ... ]
)
]
+ON STREAMING
[ COMMENT <comment> ]
DO <Insert_Command>
```
@@ -37,12 +37,12 @@ DO <Insert_Command>
```SQL
CREATE JOB <job_name>
-ON STREAMING
[ PROPERTIES (
<job_property>
[ , ... ]
)
]
+ON STREAMING
[ COMMENT <comment> ]
FROM <sourceType> (
<source_property>
@@ -77,7 +77,7 @@ TO DATABASE <target_db> (
| database | - | Database name
|
| schema | - | Schema name (PostgreSQL)
|
| include_tables | - | Tables to synchronize, comma separated. If not
specified, all tables will be synchronized by default. |
-| offset | initial | initial: full + incremental sync, latest:
incremental only |
+| offset | latest | `latest`: incremental only (default); `initial`:
full + incremental sync |
| snapshot_split_size | 8096 | Split size (in rows). During full sync, the
table is divided into multiple splits. |
| snapshot_parallelism | 1 | Parallelism during full sync phase, i.e., max
splits per task. |
@@ -100,8 +100,8 @@ TO DATABASE <target_db> (
| ------------------ | ------- |
------------------------------------------------------------ |
| session.* | - | Supports configuring all session variables in
job_properties (TVF Mode only) |
| s3.max_batch_files | 256 | Triggers an import write when the accumulated
file count reaches this value (S3 TVF only) |
-| s3.max_batch_bytes | 10G | Triggers an import write when the accumulated
data size reaches this value (S3 TVF only) |
-| max_interval | 10s | Idle scheduling interval when there are no
new files or data upstream |
+| s3.max_batch_bytes | 10737418240 | Triggers an import write when the
accumulated data size (in bytes) reaches this value. Only a plain integer byte
count is accepted, e.g. `10737418240`; a unit suffix such as `10G` is not
supported. Valid range is 100MB–10GB, i.e. 104857600–10737418240. (S3 TVF only)
|
+| max_interval | 10 | Idle scheduling interval in seconds when
there are no new files or data upstream. Only an integer (number of seconds) is
accepted, e.g. `10`; a unit suffix such as `10s` is not supported. Must be >= 1
|
## Privilege Control
@@ -153,7 +153,8 @@ The user executing this SQL command must have at least the
following privileges:
"user" = "root",
"password" = "123456",
"database" = "source_db",
- "table" = "source_table"
+ "table" = "source_table",
+ "offset" = "initial"
);
```
@@ -173,7 +174,8 @@ The user executing this SQL command must have at least the
following privileges:
"password" = "postgres",
"database" = "postgres",
"schema" = "public",
- "table" = "source_table"
+ "table" = "source_table",
+ "offset" = "initial"
);
```
diff --git a/versioned_sidebars/version-4.x-sidebars.json
b/versioned_sidebars/version-4.x-sidebars.json
index e3594eff415..8db0dbc8ce0 100644
--- a/versioned_sidebars/version-4.x-sidebars.json
+++ b/versioned_sidebars/version-4.x-sidebars.json
@@ -408,7 +408,8 @@
"data-operate/import/import-way/streaming-job/prerequisites/amazon-aurora-postgresql"
]
},
-
"data-operate/import/import-way/streaming-job/continuous-load-s3"
+
"data-operate/import/import-way/streaming-job/continuous-load-s3",
+
"data-operate/import/import-way/streaming-job/data-type-mapping"
]
}
]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]