fuweng11 commented on code in PR #1089: URL: https://github.com/apache/inlong-website/pull/1089#discussion_r1865183511
########## docs/quick_start/offline_data_sync/airflow_pulsar_mysql_example.md: ########## @@ -0,0 +1,126 @@ +--- +title: Example of Airflow Offline Synchronization +sidebar_position: 3 +--- +In the following content, a complete example will be used to introduce how to create Airflow scheduling tasks using Apache InLong and complete offline data synchronization from Pulsar to MySQL. + +## Deployment +### Install InLong + +Before we begin, we need to install InLong. Here we provide two ways: +- [Docker Deployment](deployment/docker.md) (Recommended) +- [Bare Metal Deployment](deployment/bare_metal.md) + +### Add Connectors + +Download the [connectors](https://inlong.apache.org/downloads/) corresponding to Flink version, and after decompression, place `sort-connector-jdbc-[version]-SNAPSHOT.jar` in `/inlong-sort/connectors/` directory. +> Currently, Apache InLong's offline data synchronization capability only supports Flink-1.18, so please download the 1.18 version of connectors. + +## Create Clusters And Data Target + +### Create Cluster Label + + +### Register Pulsar Cluster + + + +### Create Data Target + + + +Execute the following SQL statement: + +```mysql +CREATE TABLE sink_table ( + id INT AUTO_INCREMENT PRIMARY KEY, + name VARCHAR(255) NOT NULL, + create_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP +); +``` + +## Airflow Initialization + +### Get Initial DAG + +They can be obtained from [Inlong](https://github.com/apache/inlong). + + + +> Airflow does not provide an API for DAG creation, so two original DAGs are required. `dag_creator` is used to create offline tasks, and `dag_cleaner` is used to clean up offline tasks regularly. + +### Create Initial DAG + +Place the DAG file in the Airflow default DAG directory and wait for a while. The Airflow scheduler will scan the directory and load the DAG: + + +### Airflow REST API + +By default, Airflow will reject all REST API requests. Please refer to the [Airflow official documentation](https://airflow.apache.org/docs/apache-airflow-providers-fab/stable/auth-manager/api-authentication.html) for configuration. + +### Inlong Manager Configuration + +Modify the configuration file according to the configuration file requirements and restart Inlong Manager. +```properties +# Inlong Manager URL accessible by the scheduler +schedule.engine.inlong.manager.url=http://192.168.101.2:8083 Review Comment: Do not provide a real address. ########## docs/quick_start/offline_data_sync/airflow_pulsar_mysql_example.md: ########## @@ -0,0 +1,126 @@ +--- +title: Example of Airflow Offline Synchronization +sidebar_position: 3 +--- +In the following content, a complete example will be used to introduce how to create Airflow scheduling tasks using Apache InLong and complete offline data synchronization from Pulsar to MySQL. + +## Deployment +### Install InLong + +Before we begin, we need to install InLong. Here we provide two ways: +- [Docker Deployment](deployment/docker.md) (Recommended) +- [Bare Metal Deployment](deployment/bare_metal.md) + +### Add Connectors + +Download the [connectors](https://inlong.apache.org/downloads/) corresponding to Flink version, and after decompression, place `sort-connector-jdbc-[version]-SNAPSHOT.jar` in `/inlong-sort/connectors/` directory. +> Currently, Apache InLong's offline data synchronization capability only supports Flink-1.18, so please download the 1.18 version of connectors. + +## Create Clusters And Data Target + +### Create Cluster Label + + +### Register Pulsar Cluster + + + +### Create Data Target + + + +Execute the following SQL statement: + +```mysql +CREATE TABLE sink_table ( + id INT AUTO_INCREMENT PRIMARY KEY, + name VARCHAR(255) NOT NULL, + create_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP +); +``` + +## Airflow Initialization + +### Get Initial DAG + +They can be obtained from [Inlong](https://github.com/apache/inlong). + + + +> Airflow does not provide an API for DAG creation, so two original DAGs are required. `dag_creator` is used to create offline tasks, and `dag_cleaner` is used to clean up offline tasks regularly. + +### Create Initial DAG + +Place the DAG file in the Airflow default DAG directory and wait for a while. The Airflow scheduler will scan the directory and load the DAG: + + +### Airflow REST API + +By default, Airflow will reject all REST API requests. Please refer to the [Airflow official documentation](https://airflow.apache.org/docs/apache-airflow-providers-fab/stable/auth-manager/api-authentication.html) for configuration. + +### Inlong Manager Configuration + +Modify the configuration file according to the configuration file requirements and restart Inlong Manager. +```properties +# Inlong Manager URL accessible by the scheduler +schedule.engine.inlong.manager.url=http://192.168.101.2:8083 +# Management URL for Airflow +schedule.engine.airflow.baseUrl=http://192.168.101.16:8080 Review Comment: Ditto. ########## i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/offline_data_sync/airflow_pulsar_mysql_example.md: ########## @@ -0,0 +1,131 @@ +--- +title: Airflow 离线同步示例 +sidebar_position: 3 +--- +在下面的内容中,将通过一个完整的示例介绍如何使用 Apache InLong 创建 Airflow 调度任务,并完成 Pulsar -> MySQL 的离线数据同步。 + +## 环境部署 +### 安装 InLong + +在开始之前,我们需要安装 InLong 的全部组件,这里提供两种方式: +- [Docker 部署](deployment/docker.md)(推荐) +- [Bare Metal 部署](deployment/bare_metal.md) + +### 添加 Connectors + +下载与 Flink 版本对应的 [connectors](https://inlong.apache.org/zh-CN/downloads),解压后将 `sort-connector-jdbc-[version]-SNAPSHOT.jar` 放在 `/inlong-sort/connectors/` 目录下。 +> 当前 Apache InLong 的离线数据同步能力只支持 Flink-1.18 版本,所以请下载 1.18 版本的 connectors。、 + +## 创建集群和数据目标 + +### 创建集群标签 + + +### 注册 Pulsar 集群 + + + +### 创建数据目标 + + + +执行如下Sql语句: + +```mysql +CREATE TABLE sink_table ( + id INT AUTO_INCREMENT PRIMARY KEY, + name VARCHAR(255) NOT NULL, + create_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP +); +``` + +## Airflow 初始化 + +### 获取初始 DAG + +它们可以在[Inlong](https://github.com/apache/inlong)获取。 + + + +> Airflow 没有提供 DAG 创建的提供 API ,因此需要两个原始 DAG。`dag_creator`用于创建离线任务,`dag_cleaner`用于定时去清理离线任务。 + +### 创建初始 DAG + +首先将DAG文件放到Airflow默认的DAG目录下面,等待一段时间,Airflow调度器会去扫描该目录,并加载DAG: + + + +### Airflow REST API + +默认情况下,Airflow 会拒绝所有 REST API 请求。请参考[Airflow 官方文档](https://airflow.apache.org/docs/apache-airflow-providers-fab/stable/auth-manager/api-authentication.html)进行配置。 + +### Inlong Manager 配置 + +根据配置文件要求,对配置文件进行修改,并重启 Inlong Manager 。 + +```properties +# Inlong Manager URL accessible by the scheduler +schedule.engine.inlong.manager.url=http://192.168.101.2:8083 +# Management URL for Airflow +schedule.engine.airflow.baseUrl=http://192.168.101.16:8080 Review Comment: Ditto. ########## i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/offline_data_sync/airflow_pulsar_mysql_example.md: ########## @@ -0,0 +1,131 @@ +--- +title: Airflow 离线同步示例 +sidebar_position: 3 +--- +在下面的内容中,将通过一个完整的示例介绍如何使用 Apache InLong 创建 Airflow 调度任务,并完成 Pulsar -> MySQL 的离线数据同步。 + +## 环境部署 +### 安装 InLong + +在开始之前,我们需要安装 InLong 的全部组件,这里提供两种方式: +- [Docker 部署](deployment/docker.md)(推荐) +- [Bare Metal 部署](deployment/bare_metal.md) + +### 添加 Connectors + +下载与 Flink 版本对应的 [connectors](https://inlong.apache.org/zh-CN/downloads),解压后将 `sort-connector-jdbc-[version]-SNAPSHOT.jar` 放在 `/inlong-sort/connectors/` 目录下。 +> 当前 Apache InLong 的离线数据同步能力只支持 Flink-1.18 版本,所以请下载 1.18 版本的 connectors。、 + +## 创建集群和数据目标 + +### 创建集群标签 + + +### 注册 Pulsar 集群 + + + +### 创建数据目标 + + + +执行如下Sql语句: + +```mysql +CREATE TABLE sink_table ( + id INT AUTO_INCREMENT PRIMARY KEY, + name VARCHAR(255) NOT NULL, + create_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP +); +``` + +## Airflow 初始化 + +### 获取初始 DAG + +它们可以在[Inlong](https://github.com/apache/inlong)获取。 + + + +> Airflow 没有提供 DAG 创建的提供 API ,因此需要两个原始 DAG。`dag_creator`用于创建离线任务,`dag_cleaner`用于定时去清理离线任务。 + +### 创建初始 DAG + +首先将DAG文件放到Airflow默认的DAG目录下面,等待一段时间,Airflow调度器会去扫描该目录,并加载DAG: + + + +### Airflow REST API + +默认情况下,Airflow 会拒绝所有 REST API 请求。请参考[Airflow 官方文档](https://airflow.apache.org/docs/apache-airflow-providers-fab/stable/auth-manager/api-authentication.html)进行配置。 + +### Inlong Manager 配置 + +根据配置文件要求,对配置文件进行修改,并重启 Inlong Manager 。 + +```properties +# Inlong Manager URL accessible by the scheduler +schedule.engine.inlong.manager.url=http://192.168.101.2:8083 Review Comment: Ditto. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
