[GitHub] [incubator-seatunnel] TaoZex commented on a diff in pull request #3619: [Doc] improve README and other documents

GitBox Wed, 30 Nov 2022 08:08:25 -0800


TaoZex commented on code in PR #3619:
URL: 
https://github.com/apache/incubator-seatunnel/pull/3619#discussion_r1036128849



##########
README.md:
##########
@@ -19,49 +19,43 @@ been used in the production of nearly 100 companies.
 
 ## Why do we need SeaTunnel
 
-SeaTunnel will do its best to solve the problems that may be encountered in 
the synchronization of massive data:
+SeaTunnel focuses on data integration and data synchronization, and is mainly 
designed to solve common problems in the field of data integration:
 
-- Data loss and duplication
-- Task accumulation and delay
-- Low throughput
-- Long cycle to be applied in the production environment
-- Lack of application running status monitoring
+- Various data sources: There are hundreds of commonly-used data sources of 
which versions are incompatible. With the emergence of new technologies, more 
data sources are appearing. It is difficult for users to find a tool that can 
fully and quickly support these data sources.
+- Complex synchronization scenarios: Data synchronization needs to support 
various synchronization scenarios such as offline-full synchronization, 
offline- incremental synchronization, CDC, real-time synchronization, and full 
database synchronization.

Review Comment:
   ```suggestion
   - Complex synchronization scenarios: Data synchronization needs to support 
various synchronization scenarios such as offline-full synchronization, 
offline-incremental synchronization, CDC, real-time synchronization, and full 
database synchronization.
   ```



##########
README.md:
##########
@@ -19,49 +19,43 @@ been used in the production of nearly 100 companies.
 
 ## Why do we need SeaTunnel
 
-SeaTunnel will do its best to solve the problems that may be encountered in 
the synchronization of massive data:
+SeaTunnel focuses on data integration and data synchronization, and is mainly 
designed to solve common problems in the field of data integration:
 
-- Data loss and duplication
-- Task accumulation and delay
-- Low throughput
-- Long cycle to be applied in the production environment
-- Lack of application running status monitoring
+- Various data sources: There are hundreds of commonly-used data sources of 
which versions are incompatible. With the emergence of new technologies, more 
data sources are appearing. It is difficult for users to find a tool that can 
fully and quickly support these data sources.
+- Complex synchronization scenarios: Data synchronization needs to support 
various synchronization scenarios such as offline-full synchronization, 
offline- incremental synchronization, CDC, real-time synchronization, and full 
database synchronization.
+- High demand in resource: Existing data integration and data synchronization 
tools often require vast computing resources or JDBC connection resources to 
complete real-time synchronization of massive small tables. This has increased 
the burden on enterprises to a certain extent.
+- Lack of quality and monitoring: Data integration and synchronization 
processes often experience loss or duplication of data. The synchronization 
process lacks monitoring, and it is impossible to intuitively understand the 
real-situation of the data during the task process.
+- Complex technology stack: The technology components used by enterprises are 
different, and users need to develop corresponding synchronization programs for 
different components to complete data integration.
+- Difficulty in management and maintenance: Limited to different underlying 
technology components (Flink/Spark) , offline synchronization and real-time 
synchronization often have be developed and managed separately, which increases 
thedifficulty of the management and maintainance.

Review Comment:
   ```suggestion
   - Difficulty in management and maintenance: Limited to different underlying 
technology components (Flink/Spark) , offline synchronization and real-time 
synchronization often have be developed and managed separately, which increases 
the difficulty of the management and maintainance.
   ```



##########
README.md:
##########
@@ -19,49 +19,43 @@ been used in the production of nearly 100 companies.
 
 ## Why do we need SeaTunnel
 
-SeaTunnel will do its best to solve the problems that may be encountered in 
the synchronization of massive data:
+SeaTunnel focuses on data integration and data synchronization, and is mainly 
designed to solve common problems in the field of data integration:
 
-- Data loss and duplication
-- Task accumulation and delay
-- Low throughput
-- Long cycle to be applied in the production environment
-- Lack of application running status monitoring
+- Various data sources: There are hundreds of commonly-used data sources of 
which versions are incompatible. With the emergence of new technologies, more 
data sources are appearing. It is difficult for users to find a tool that can 
fully and quickly support these data sources.
+- Complex synchronization scenarios: Data synchronization needs to support 
various synchronization scenarios such as offline-full synchronization, 
offline- incremental synchronization, CDC, real-time synchronization, and full 
database synchronization.
+- High demand in resource: Existing data integration and data synchronization 
tools often require vast computing resources or JDBC connection resources to 
complete real-time synchronization of massive small tables. This has increased 
the burden on enterprises to a certain extent.
+- Lack of quality and monitoring: Data integration and synchronization 
processes often experience loss or duplication of data. The synchronization 
process lacks monitoring, and it is impossible to intuitively understand the 
real-situation of the data during the task process.
+- Complex technology stack: The technology components used by enterprises are 
different, and users need to develop corresponding synchronization programs for 
different components to complete data integration.
+- Difficulty in management and maintenance: Limited to different underlying 
technology components (Flink/Spark) , offline synchronization and real-time 
synchronization often have be developed and managed separately, which increases 
thedifficulty of the management and maintainance.
 
-## SeaTunnel use scenarios
+## Features of SeaTunnel
 
-- Mass data synchronization
-- Mass data integration
-- ETL with massive data
-- Mass data aggregation
-- Multi-source data processing
+- Rich and extensible Connector: SeaTunnel provides a Connector API that does 
not depend on a specific execution engine. Connectors (Source, Transform, Sink) 
developed based on this API can run On many different engines, such as 
SeaTunnel Engine, Flink, Spark that are currently supported.
+- Connector plug-in: The plug-in design allows users to easily develop their 
own Connector and integrate it into the SeaTunnel project. Currently, SeaTunnel 
has supported more than 70 Connectors, and the number is surging. There is the 
list of the currently-supported connectors: xxxxxxx, and t he list of planned 
connectors: xxxxxxx.

Review Comment:
   ```suggestion
   - Connector plugin: The plugin design allows users to easily develop their 
own Connector and integrate it into the SeaTunnel project. Currently, SeaTunnel 
has supported more than 70 Connectors, and the number is surging. There is the 
list of the currently supported connectors: xxxxxxx, and the list of planned 
connectors: xxxxxxx.
   ```



##########
README.md:
##########
@@ -19,49 +19,43 @@ been used in the production of nearly 100 companies.
 
 ## Why do we need SeaTunnel
 
-SeaTunnel will do its best to solve the problems that may be encountered in 
the synchronization of massive data:
+SeaTunnel focuses on data integration and data synchronization, and is mainly 
designed to solve common problems in the field of data integration:
 
-- Data loss and duplication
-- Task accumulation and delay
-- Low throughput
-- Long cycle to be applied in the production environment
-- Lack of application running status monitoring
+- Various data sources: There are hundreds of commonly-used data sources of 
which versions are incompatible. With the emergence of new technologies, more 
data sources are appearing. It is difficult for users to find a tool that can 
fully and quickly support these data sources.
+- Complex synchronization scenarios: Data synchronization needs to support 
various synchronization scenarios such as offline-full synchronization, 
offline- incremental synchronization, CDC, real-time synchronization, and full 
database synchronization.
+- High demand in resource: Existing data integration and data synchronization 
tools often require vast computing resources or JDBC connection resources to 
complete real-time synchronization of massive small tables. This has increased 
the burden on enterprises to a certain extent.
+- Lack of quality and monitoring: Data integration and synchronization 
processes often experience loss or duplication of data. The synchronization 
process lacks monitoring, and it is impossible to intuitively understand the 
real-situation of the data during the task process.
+- Complex technology stack: The technology components used by enterprises are 
different, and users need to develop corresponding synchronization programs for 
different components to complete data integration.
+- Difficulty in management and maintenance: Limited to different underlying 
technology components (Flink/Spark) , offline synchronization and real-time 
synchronization often have be developed and managed separately, which increases 
thedifficulty of the management and maintainance.
 
-## SeaTunnel use scenarios
+## Features of SeaTunnel
 
-- Mass data synchronization
-- Mass data integration
-- ETL with massive data
-- Mass data aggregation
-- Multi-source data processing
+- Rich and extensible Connector: SeaTunnel provides a Connector API that does 
not depend on a specific execution engine. Connectors (Source, Transform, Sink) 
developed based on this API can run On many different engines, such as 
SeaTunnel Engine, Flink, Spark that are currently supported.
+- Connector plug-in: The plug-in design allows users to easily develop their 
own Connector and integrate it into the SeaTunnel project. Currently, SeaTunnel 
has supported more than 70 Connectors, and the number is surging. There is the 
list of the currently-supported connectors: xxxxxxx, and t he list of planned 
connectors: xxxxxxx.
+- Batch-stream integration: Connectors developed based on SeaTunnel Connector 
API are perfectly compatible with offline synchronization, real-time 
synchronization, full- synchronization, incremental synchronization and other 
scenarios. It greatly reduces the difficulty of managing data integration tasks.
+- Support distributed snapshot algorithm to ensure data consistency.
+- Multi-engine support: SeaTunnel uses SeaTunnel Engine for data 
synchronization by default. At the same time, SeaTunnel also supports the use 
of Flink or Spark as the execution engine of the Connector to adapt to the 
existing technical components of the enterprise. SeaTunnel supports multiple 
versions of Spark and Flink.
+- JDBC multiplexing, database log multi-table parsing: SeaTunnel supports 
multi-table or whole database synchronization, which solves the problem of 
over- JDBC connections; supports multi-table or whole database log reading and 
parsing, which solves the need for CDC multi-table synchronization scenarios 
Problems with repeated reading and parsing of logs.
+- High throughput and low latency: SeaTunnel supports parallel reading and 
writing, providing stable and reliable data synchronization capabilities with 
high throughput and low latency.
+- Perfect real-time monitoring: SeaTunnel supports detailed monitoring 
information of each step in the data synchronization process, allowing users to 
easily understand the number of data, data size, QPS and other information read 
and written by the synchronization task.
+- Two job development methods are supported: coding and canvas design: The 
SeaTunnel web project https://github.com/apache/incubator-seatunnel-web 
provides visual management of jobs, scheduling, running and monitoring 
capabilities.

Review Comment:
   ```suggestion
   - Two job development methods are supported: coding and canvas design. The 
SeaTunnel web project https://github.com/apache/incubator-seatunnel-web 
provides visual management of jobs, scheduling, running and monitoring 
capabilities.
   ```



##########
README.md:
##########
@@ -19,49 +19,43 @@ been used in the production of nearly 100 companies.
 
 ## Why do we need SeaTunnel
 
-SeaTunnel will do its best to solve the problems that may be encountered in 
the synchronization of massive data:
+SeaTunnel focuses on data integration and data synchronization, and is mainly 
designed to solve common problems in the field of data integration:
 
-- Data loss and duplication
-- Task accumulation and delay
-- Low throughput
-- Long cycle to be applied in the production environment
-- Lack of application running status monitoring
+- Various data sources: There are hundreds of commonly-used data sources of 
which versions are incompatible. With the emergence of new technologies, more 
data sources are appearing. It is difficult for users to find a tool that can 
fully and quickly support these data sources.
+- Complex synchronization scenarios: Data synchronization needs to support 
various synchronization scenarios such as offline-full synchronization, 
offline- incremental synchronization, CDC, real-time synchronization, and full 
database synchronization.
+- High demand in resource: Existing data integration and data synchronization 
tools often require vast computing resources or JDBC connection resources to 
complete real-time synchronization of massive small tables. This has increased 
the burden on enterprises to a certain extent.
+- Lack of quality and monitoring: Data integration and synchronization 
processes often experience loss or duplication of data. The synchronization 
process lacks monitoring, and it is impossible to intuitively understand the 
real-situation of the data during the task process.
+- Complex technology stack: The technology components used by enterprises are 
different, and users need to develop corresponding synchronization programs for 
different components to complete data integration.
+- Difficulty in management and maintenance: Limited to different underlying 
technology components (Flink/Spark) , offline synchronization and real-time 
synchronization often have be developed and managed separately, which increases 
thedifficulty of the management and maintainance.
 
-## SeaTunnel use scenarios
+## Features of SeaTunnel
 
-- Mass data synchronization
-- Mass data integration
-- ETL with massive data
-- Mass data aggregation
-- Multi-source data processing
+- Rich and extensible Connector: SeaTunnel provides a Connector API that does 
not depend on a specific execution engine. Connectors (Source, Transform, Sink) 
developed based on this API can run On many different engines, such as 
SeaTunnel Engine, Flink, Spark that are currently supported.

Review Comment:
   ```suggestion
   - Rich and extensible Connector: SeaTunnel provides a Connector API that 
does not depend on a specific execution engine. Connectors (Source, Transform, 
Sink) developed based on this API can run on many different engines, such as 
SeaTunnel Engine, Flink, Spark that are currently supported.
   ```



##########
docs/en/start-v2/locally/quick-start-seatunnel-engine.md:
##########
@@ -0,0 +1,88 @@
+---
+sidebar_position: 2
+---
+
+# Quick Start With SeaTunnel Engine
+
+## Step 1: Deployment SeaTunnel And Connectors
+
+Before starting, make sure you have downloaded and deployed SeaTunnel as 
described in [deployment](deployment.md)
+
+## Step 2: Add Job Config File to define a job
+
+Edit `config/seatunnel.streaming.conf.template`, which determines the way and 
logic of data input, processing, and output after seatunnel is started.
+The following is an example of the configuration file, which is the same as 
the example application mentioned above.
+
+```hocon
+env {
+  execution.parallelism = 1
+  job.mode = "BATCH"
+}
+
+source {
+    FakeSource {
+      result_table_name = "fake"
+      row.num = 16
+      schema = {
+        fields {
+          name = "string"
+          age = "int"
+        }
+      }
+    }
+}
+
+transform {
+
+}
+
+sink {
+  Console {}
+}
+
+```
+
+More information about config please check [config concept](../concept/config)
+
+## Step 3: Run SeaTunnel Application
+
+You could start the application by the following commands
+
+```shell
+cd "apache-seatunnel-incubating-${version}"
+./bin/seatunnel.sh --config ./config/seatunnel.streaming.conf.template -e local
+
+```
+
+**See The Output**: When you run the command, you could see its output in your 
console, You can think this

Review Comment:
   ```suggestion
   **See The Output**: When you run the command, you could see its output in 
your console. You can think this
   ```



##########
docs/en/start-v2/locally/quick-start-spark.md:
##########
@@ -0,0 +1,99 @@
+---
+sidebar_position: 4
+---
+
+# Quick Start With Spark
+
+## Step 1: Deployment SeaTunnel And Connectors
+
+Before starting, make sure you have downloaded and deployed SeaTunnel as 
described in [deployment](deployment.md)
+
+## Step 2: Deployment And Config Spark
+
+Please [download Spark](https://spark.apache.org/downloads.html) 
first(**required version >= 2 and version < 3.x **). For more information you 
could
+see [Getting Started: 
standalone](https://spark.apache.org/docs/latest/spark-standalone.html#installing-spark-standalone-to-a-cluster)
+
+**Configure SeaTunnel**: Change the setting in `config/seatunnel-env.sh`, it 
is base on the path your engine install at [deployment](deployment.md).
+Change `SPARK_HOME` to the Spark deployment dir.
+
+
+## Step 3: Add Job Config File to define a job
+
+Edit `config/seatunnel.streaming.conf.template`, which determines the way and 
logic of data input, processing, and output after seatunnel is started.
+The following is an example of the configuration file, which is the same as 
the example application mentioned above.
+
+```hocon
+env {
+  execution.parallelism = 1
+  job.mode = "BATCH"
+}
+
+source {
+    FakeSource {
+      result_table_name = "fake"
+      row.num = 16
+      schema = {
+        fields {
+          name = "string"
+          age = "int"
+        }
+      }
+    }
+}
+
+transform {
+
+}
+
+sink {
+  Console {}
+}
+
+```
+
+More information about config please check [config concept](../concept/config)
+
+## Step 3: Run SeaTunnel Application
+
+You could start the application by the following commands
+
+```shell
+cd "apache-seatunnel-incubating-${version}"
+./bin/start-seatunnel-spark-connector-v2.sh \
+--master local[4] \
+--deploy-mode client \
+--config ./config/seatunnel.streaming.conf.template
+```
+
+**See The Output**: When you run the command, you could see its output in your 
console, You can think this

Review Comment:
   ```suggestion
   **See The Output**: When you run the command, you could see its output in 
your console. You can think this
   ```



##########
README.md:
##########
@@ -19,49 +19,43 @@ been used in the production of nearly 100 companies.
 
 ## Why do we need SeaTunnel
 
-SeaTunnel will do its best to solve the problems that may be encountered in 
the synchronization of massive data:
+SeaTunnel focuses on data integration and data synchronization, and is mainly 
designed to solve common problems in the field of data integration:
 
-- Data loss and duplication
-- Task accumulation and delay
-- Low throughput
-- Long cycle to be applied in the production environment
-- Lack of application running status monitoring
+- Various data sources: There are hundreds of commonly-used data sources of 
which versions are incompatible. With the emergence of new technologies, more 
data sources are appearing. It is difficult for users to find a tool that can 
fully and quickly support these data sources.
+- Complex synchronization scenarios: Data synchronization needs to support 
various synchronization scenarios such as offline-full synchronization, 
offline- incremental synchronization, CDC, real-time synchronization, and full 
database synchronization.
+- High demand in resource: Existing data integration and data synchronization 
tools often require vast computing resources or JDBC connection resources to 
complete real-time synchronization of massive small tables. This has increased 
the burden on enterprises to a certain extent.
+- Lack of quality and monitoring: Data integration and synchronization 
processes often experience loss or duplication of data. The synchronization 
process lacks monitoring, and it is impossible to intuitively understand the 
real-situation of the data during the task process.
+- Complex technology stack: The technology components used by enterprises are 
different, and users need to develop corresponding synchronization programs for 
different components to complete data integration.
+- Difficulty in management and maintenance: Limited to different underlying 
technology components (Flink/Spark) , offline synchronization and real-time 
synchronization often have be developed and managed separately, which increases 
thedifficulty of the management and maintainance.
 
-## SeaTunnel use scenarios
+## Features of SeaTunnel
 
-- Mass data synchronization
-- Mass data integration
-- ETL with massive data
-- Mass data aggregation
-- Multi-source data processing
+- Rich and extensible Connector: SeaTunnel provides a Connector API that does 
not depend on a specific execution engine. Connectors (Source, Transform, Sink) 
developed based on this API can run On many different engines, such as 
SeaTunnel Engine, Flink, Spark that are currently supported.
+- Connector plug-in: The plug-in design allows users to easily develop their 
own Connector and integrate it into the SeaTunnel project. Currently, SeaTunnel 
has supported more than 70 Connectors, and the number is surging. There is the 
list of the currently-supported connectors: xxxxxxx, and t he list of planned 
connectors: xxxxxxx.
+- Batch-stream integration: Connectors developed based on SeaTunnel Connector 
API are perfectly compatible with offline synchronization, real-time 
synchronization, full- synchronization, incremental synchronization and other 
scenarios. It greatly reduces the difficulty of managing data integration tasks.
+- Support distributed snapshot algorithm to ensure data consistency.
+- Multi-engine support: SeaTunnel uses SeaTunnel Engine for data 
synchronization by default. At the same time, SeaTunnel also supports the use 
of Flink or Spark as the execution engine of the Connector to adapt to the 
existing technical components of the enterprise. SeaTunnel supports multiple 
versions of Spark and Flink.

Review Comment:
   ```suggestion
   - Multi-engine support: SeaTunnel uses SeaTunnel Engine for data 
synchronization by default. At the same time, SeaTunnel also supports the use 
of Flink or Spark as the execution engine of the Connector to adapt to the 
existing technical components of the enterprise. In addition, SeaTunnel 
supports multiple versions of Spark and Flink.
   ```



##########
docs/en/start-v2/locally/quick-start-flink.md:
##########
@@ -0,0 +1,96 @@
+---
+sidebar_position: 3
+---
+
+# Quick Start With Flink
+
+## Step 1: Deployment SeaTunnel And Connectors
+
+Before starting, make sure you have downloaded and deployed SeaTunnel as 
described in [deployment](deployment.md)
+
+## Step 2: Deployment And Config Flink
+
+Please [download Flink](https://flink.apache.org/downloads.html) 
first(**required version >= 1.12.0 and version < 1.14.x **). For more 
information you could see [Getting Started: 
standalone](https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/resource-providers/standalone/overview/)
+
+**Configure SeaTunnel**: Change the setting in `config/seatunnel-env.sh`, it 
is base on the path your engine install at [deployment](deployment.md).
+Change `FLINK_HOME` to the Flink deployment dir.
+
+
+## Step 3: Add Job Config File to define a job
+
+Edit `config/seatunnel.streaming.conf.template`, which determines the way and 
logic of data input, processing, and output after seatunnel is started.
+The following is an example of the configuration file, which is the same as 
the example application mentioned above.
+
+```hocon
+env {
+  execution.parallelism = 1
+  job.mode = "BATCH"
+}
+
+source {
+    FakeSource {
+      result_table_name = "fake"
+      row.num = 16
+      schema = {
+        fields {
+          name = "string"
+          age = "int"
+        }
+      }
+    }
+}
+
+transform {
+
+}
+
+sink {
+  Console {}
+}
+
+```
+
+More information about config please check [config concept](../concept/config)
+
+## Step 3: Run SeaTunnel Application
+
+You could start the application by the following commands
+
+```shell
+cd "apache-seatunnel-incubating-${version}"
+./bin/start-seatunnel-flink-connector-v2.sh --config 
./config/seatunnel.streaming.conf.template
+
+```
+
+**See The Output**: When you run the command, you could see its output in your 
console, You can think this

Review Comment:
   ```suggestion
   **See The Output**: When you run the command, you could see its output in 
your console. You can think this
   ```



##########
README.md:
##########
@@ -19,49 +19,43 @@ been used in the production of nearly 100 companies.
 
 ## Why do we need SeaTunnel
 
-SeaTunnel will do its best to solve the problems that may be encountered in 
the synchronization of massive data:
+SeaTunnel focuses on data integration and data synchronization, and is mainly 
designed to solve common problems in the field of data integration:
 
-- Data loss and duplication
-- Task accumulation and delay
-- Low throughput
-- Long cycle to be applied in the production environment
-- Lack of application running status monitoring
+- Various data sources: There are hundreds of commonly-used data sources of 
which versions are incompatible. With the emergence of new technologies, more 
data sources are appearing. It is difficult for users to find a tool that can 
fully and quickly support these data sources.
+- Complex synchronization scenarios: Data synchronization needs to support 
various synchronization scenarios such as offline-full synchronization, 
offline- incremental synchronization, CDC, real-time synchronization, and full 
database synchronization.
+- High demand in resource: Existing data integration and data synchronization 
tools often require vast computing resources or JDBC connection resources to 
complete real-time synchronization of massive small tables. This has increased 
the burden on enterprises to a certain extent.
+- Lack of quality and monitoring: Data integration and synchronization 
processes often experience loss or duplication of data. The synchronization 
process lacks monitoring, and it is impossible to intuitively understand the 
real-situation of the data during the task process.
+- Complex technology stack: The technology components used by enterprises are 
different, and users need to develop corresponding synchronization programs for 
different components to complete data integration.
+- Difficulty in management and maintenance: Limited to different underlying 
technology components (Flink/Spark) , offline synchronization and real-time 
synchronization often have be developed and managed separately, which increases 
thedifficulty of the management and maintainance.
 
-## SeaTunnel use scenarios
+## Features of SeaTunnel
 
-- Mass data synchronization
-- Mass data integration
-- ETL with massive data
-- Mass data aggregation
-- Multi-source data processing
+- Rich and extensible Connector: SeaTunnel provides a Connector API that does 
not depend on a specific execution engine. Connectors (Source, Transform, Sink) 
developed based on this API can run On many different engines, such as 
SeaTunnel Engine, Flink, Spark that are currently supported.
+- Connector plug-in: The plug-in design allows users to easily develop their 
own Connector and integrate it into the SeaTunnel project. Currently, SeaTunnel 
has supported more than 70 Connectors, and the number is surging. There is the 
list of the currently-supported connectors: xxxxxxx, and t he list of planned 
connectors: xxxxxxx.

Review Comment:
   Using `plug-in` is right, but `plugin` is better.



##########
README.md:
##########
@@ -19,49 +19,43 @@ been used in the production of nearly 100 companies.
 
 ## Why do we need SeaTunnel
 
-SeaTunnel will do its best to solve the problems that may be encountered in 
the synchronization of massive data:
+SeaTunnel focuses on data integration and data synchronization, and is mainly 
designed to solve common problems in the field of data integration:
 
-- Data loss and duplication
-- Task accumulation and delay
-- Low throughput
-- Long cycle to be applied in the production environment
-- Lack of application running status monitoring
+- Various data sources: There are hundreds of commonly-used data sources of 
which versions are incompatible. With the emergence of new technologies, more 
data sources are appearing. It is difficult for users to find a tool that can 
fully and quickly support these data sources.
+- Complex synchronization scenarios: Data synchronization needs to support 
various synchronization scenarios such as offline-full synchronization, 
offline- incremental synchronization, CDC, real-time synchronization, and full 
database synchronization.
+- High demand in resource: Existing data integration and data synchronization 
tools often require vast computing resources or JDBC connection resources to 
complete real-time synchronization of massive small tables. This has increased 
the burden on enterprises to a certain extent.
+- Lack of quality and monitoring: Data integration and synchronization 
processes often experience loss or duplication of data. The synchronization 
process lacks monitoring, and it is impossible to intuitively understand the 
real-situation of the data during the task process.
+- Complex technology stack: The technology components used by enterprises are 
different, and users need to develop corresponding synchronization programs for 
different components to complete data integration.
+- Difficulty in management and maintenance: Limited to different underlying 
technology components (Flink/Spark) , offline synchronization and real-time 
synchronization often have be developed and managed separately, which increases 
thedifficulty of the management and maintainance.
 
-## SeaTunnel use scenarios
+## Features of SeaTunnel
 
-- Mass data synchronization
-- Mass data integration
-- ETL with massive data
-- Mass data aggregation
-- Multi-source data processing
+- Rich and extensible Connector: SeaTunnel provides a Connector API that does 
not depend on a specific execution engine. Connectors (Source, Transform, Sink) 
developed based on this API can run On many different engines, such as 
SeaTunnel Engine, Flink, Spark that are currently supported.
+- Connector plug-in: The plug-in design allows users to easily develop their 
own Connector and integrate it into the SeaTunnel project. Currently, SeaTunnel 
has supported more than 70 Connectors, and the number is surging. There is the 
list of the currently-supported connectors: xxxxxxx, and t he list of planned 
connectors: xxxxxxx.
+- Batch-stream integration: Connectors developed based on SeaTunnel Connector 
API are perfectly compatible with offline synchronization, real-time 
synchronization, full- synchronization, incremental synchronization and other 
scenarios. It greatly reduces the difficulty of managing data integration tasks.
+- Support distributed snapshot algorithm to ensure data consistency.
+- Multi-engine support: SeaTunnel uses SeaTunnel Engine for data 
synchronization by default. At the same time, SeaTunnel also supports the use 
of Flink or Spark as the execution engine of the Connector to adapt to the 
existing technical components of the enterprise. SeaTunnel supports multiple 
versions of Spark and Flink.
+- JDBC multiplexing, database log multi-table parsing: SeaTunnel supports 
multi-table or whole database synchronization, which solves the problem of 
over- JDBC connections; supports multi-table or whole database log reading and 
parsing, which solves the need for CDC multi-table synchronization scenarios 
Problems with repeated reading and parsing of logs.

Review Comment:
   ```suggestion
   - JDBC multiplexing, database log multi-table parsing: SeaTunnel supports 
multi-table or whole database synchronization, which solves the problem of 
over-JDBC connections; supports multi-table or whole database log reading and 
parsing, which solves the need for CDC multi-table synchronization scenarios 
problems with repeated reading and parsing of logs.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-seatunnel] TaoZex commented on a diff in pull request #3619: [Doc] improve README and other documents

Reply via email to