This is an automated email from the ASF dual-hosted git repository.
leonard pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/flink-cdc.git
The following commit(s) were added to refs/heads/master by this push:
new e73f3adaa [FLINK-34679][cdc][docs] Add core concept pages for Flink
CDC docs
e73f3adaa is described below
commit e73f3adaa365c65882fa0863385e90612d44b278
Author: kunni <[email protected]>
AuthorDate: Mon Mar 18 19:44:13 2024 +0800
[FLINK-34679][cdc][docs] Add core concept pages for Flink CDC docs
This closes #3153.
---
docs/content/docs/core-concept/data-pipeline.md | 77 +++++++++++++++++++++++++
docs/content/docs/core-concept/data-sink.md | 25 ++++++++
docs/content/docs/core-concept/data-source.md | 26 +++++++++
docs/content/docs/core-concept/route.md | 49 ++++++++++++++++
docs/content/docs/core-concept/table-id.md | 15 +++++
docs/content/docs/core-concept/transform.md | 7 +++
6 files changed, 199 insertions(+)
diff --git a/docs/content/docs/core-concept/data-pipeline.md
b/docs/content/docs/core-concept/data-pipeline.md
index a1cf1986e..3903c922b 100644
--- a/docs/content/docs/core-concept/data-pipeline.md
+++ b/docs/content/docs/core-concept/data-pipeline.md
@@ -23,3 +23,80 @@ KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
+
+# Definition
+Since events in Flink CDC flow from the upstream to the downstream in a
pipeline manner, the whole ETL task is referred as a **Data Pipeline**.
+
+# Parameters
+A pipeline corresponds to a chain of operators in Flink.
+To describe a Data Pipeline, the following parts are required:
+- [source]({{< ref "docs/core-concept/data-source" >}})
+- [sink]({{< ref "docs/core-concept/data-sink" >}})
+- [pipeline](#pipeline-configurations)
+
+the following parts are optional:
+- [route]({{< ref "docs/core-concept/route" >}})
+- [transform]({{< ref "docs/core-concept/transform" >}})
+
+# Example
+## Only required
+We could use following yaml file to define a concise Data Pipeline describing
synchronize all tables under MySQL app_db database to Doris :
+
+```yaml
+ source:
+ type: mysql
+ hostname: localhost
+ port: 3306
+ username: root
+ password: 123456
+ tables: app_db.\.*
+
+ sink:
+ type: doris
+ fenodes: 127.0.0.1:8030
+ username: root
+ password: ""
+
+ pipeline:
+ name: Sync MySQL Database to Doris
+ parallelism: 2
+```
+
+## With optional
+We could use following yaml file to define a complicated Data Pipeline
describing synchronize all tables under MySQL app_db database to Doris and give
specific target database name ods_db and specific target table name prefix ods_
:
+
+```yaml
+ source:
+ type: mysql
+ hostname: localhost
+ port: 3306
+ username: root
+ password: 123456
+ tables: app_db.\.*
+
+ sink:
+ type: doris
+ fenodes: 127.0.0.1:8030
+ username: root
+ password: ""
+ route:
+ - source-table: app_db.orders
+ sink-table: ods_db.ods_orders
+ - source-table: app_db.shipments
+ sink-table: ods_db.ods_shipments
+ - source-table: app_db.products
+ sink-table: ods_db.ods_products
+
+ pipeline:
+ name: Sync MySQL Database to Doris
+ parallelism: 2
+```
+
+# Pipeline Configurations
+The following config options of Data Pipeline level are supported:
+
+| parameter | meaning
| optional/required |
+|-----------------|-----------------------------------------------------------------------------------------|-------------------|
+| name | The name of the pipeline, which will be submitted to the
Flink cluster as the job name. | optional |
+| parallelism | The global parallelism of the pipeline.
| required |
+| local-time-zone | The local time zone defines current session time zone id.
| optional |
\ No newline at end of file
diff --git a/docs/content/docs/core-concept/data-sink.md
b/docs/content/docs/core-concept/data-sink.md
index 9c86f00f6..2dab1dc4a 100644
--- a/docs/content/docs/core-concept/data-sink.md
+++ b/docs/content/docs/core-concept/data-sink.md
@@ -23,3 +23,28 @@ KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
+
+# Definition
+**Data Sink** is used to apply schema changes and write change data to
external systems.
+A Data Sink can write to multiple tables simultaneously.
+
+# Parameters
+To describe a data sink, the follows are required:
+
+| parameter | meaning
| optional/required |
+|-----------------------------|-------------------------------------------------------------------------------------------------|-------------------|
+| type | The type of the sink, such as doris or
starrocks. | required |
+| name | The name of the sink, which is user-defined (a
default value provided). | optional |
+| configurations of Data Sink | Configurations to build the Data Sink e.g.
connection configurations and sink table properties. | optional |
+
+# Example
+We could use this yaml file to define a doris sink:
+```yaml
+sink:
+ type: doris
+ name: doris-sink # Optional parameter for description
purpose
+ fenodes: 127.0.0.1:8030
+ username: root
+ password: ""
+ table.create.properties.replication_num: 1 # Optional parameter
for advanced functionalities
+```
\ No newline at end of file
diff --git a/docs/content/docs/core-concept/data-source.md
b/docs/content/docs/core-concept/data-source.md
index d2859bd58..5d6c33deb 100644
--- a/docs/content/docs/core-concept/data-source.md
+++ b/docs/content/docs/core-concept/data-source.md
@@ -23,3 +23,29 @@ KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
+
+# Definition
+**Data Source** is used to access metadata and read the changed data from
external systems.
+A Data Source can read data from multiple tables simultaneously.
+
+# Parameters
+To describe a data source, the follows are required:
+
+| parameter | meaning
| optional/required |
+|-------------------------------|-----------------------------------------------------------------------------------------------------|-------------------|
+| type | The type of the source, such as mysql.
| required |
+| name | The name of the source, which is
user-defined (a default value provided). | optional
|
+| configurations of Data Source | Configurations to build the Data Source e.g.
connection configurations and source table properties. | optional |
+
+# Example
+We could use yaml files to define a mysql source:
+```yaml
+source:
+ type: mysql
+ name: mysql-source #optional,description information
+ host: localhost
+ port: 3306
+ username: admin
+ password: pass
+ tables: adb.*, bdb.user_table_[0-9]+, [app|web]_order_\.*
+```
\ No newline at end of file
diff --git a/docs/content/docs/core-concept/route.md
b/docs/content/docs/core-concept/route.md
index 9dbe80c03..0a8c906fb 100644
--- a/docs/content/docs/core-concept/route.md
+++ b/docs/content/docs/core-concept/route.md
@@ -23,3 +23,52 @@ KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
+
+# Definition
+**Route** specifies the rule of matching a list of source-table and mapping to
sink-table. The most typical scenario is the merge of sub-databases and
sub-tables, routing multiple upstream source tables to the same sink table.
+
+# Parameters
+To describe a route, the follows are required:
+
+| parameter | meaning |
optional/required |
+|--------------|----------------------------------------------------|-------------------|
+| source-table | Source table id, supports regular expressions | required
|
+| sink-table | Sink table id, supports regular expressions | required
|
+| description | Routing rule description(a default value provided) | optional
|
+
+A route module can contain a list of source-table/sink-table rules.
+
+# Example
+## Route one Data Source table to one Data Sink table
+if synchronize the table `web_order` in the database `mydb` to a Doris table
`ods_web_order`, we can use this yaml file to define this route:
+
+```yaml
+route:
+ source-table: mydb.web_order
+ sink-table: mydb.ods_web_order
+ description: sync table to one destination table with given prefix ods_
+```
+
+## Route multiple Data Source tables to one Data Sink table
+What's more, if you want to synchronize the sharding tables in the database
`mydb` to a Doris table `ods_web_order`, we can use this yaml file to define
this route:
+```yaml
+route:
+ source-table: mydb\.*
+ sink-table: mydb.ods_web_order
+ description: sync sharding tables to one destination table
+```
+
+## Complex Route via combining route rules
+What's more, if you want to specify many different mapping rules, we can use
this yaml file to define this route:
+```yaml
+route:
+ - source-table: mydb.orders
+ sink-table: ods_db.ods_orders
+ description: sync orders table to orders
+ - source-table: mydb.shipments
+ sink-table: ods_db.ods_shipments
+ description: sync shipments table to ods_shipments
+ - source-table: mydb.products
+ sink-table: ods_db.ods_products
+ description: sync products table to ods_products
+```
\ No newline at end of file
diff --git a/docs/content/docs/core-concept/table-id.md
b/docs/content/docs/core-concept/table-id.md
index 83769301c..261c8fd09 100644
--- a/docs/content/docs/core-concept/table-id.md
+++ b/docs/content/docs/core-concept/table-id.md
@@ -23,3 +23,18 @@ KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
+
+# Definition
+When connecting to external systems, it is necessary to establish a mapping
relationship with the storage objects of the external system. This is what
**Table Id** refers to.
+
+# Example
+To be compatible with most external systems, the Table Id is represented by a
3-tuple : (namespace, schemaName, tableName).
+Connectors should establish the mapping between Table Id and storage objects
in external systems.
+
+The following table lists the parts in table Id of different data systems:
+
+| data system | parts in tableId | String example |
+|-----------------------|--------------------------|---------------------|
+| Oracle/PostgreSQL | database, schema, table | mydb.default.orders |
+| MySQL/Doris/StarRocks | database, table | mydb.orders |
+| Kafka | topic | orders |
diff --git a/docs/content/docs/core-concept/transform.md
b/docs/content/docs/core-concept/transform.md
index 76015dea1..0ffa24829 100644
--- a/docs/content/docs/core-concept/transform.md
+++ b/docs/content/docs/core-concept/transform.md
@@ -23,3 +23,10 @@ KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
+
+# Definition
+**Transform** module helps users delete and expand data columns based on the
data columns in the table.
+What's more, it also helps users filter some unnecessary data during the
synchronization process.
+
+# Example
+This feature will support soon.
\ No newline at end of file