lvyanquan commented on code in PR #4337: URL: https://github.com/apache/flink-cdc/pull/4337#discussion_r2986468540
########## docs/content.zh/docs/get-started/quickstart/postgres-to-fluss.md: ########## @@ -0,0 +1,406 @@ +--- +title: "Postgres 同步到 Fluss" +weight: 5 +type: docs +aliases: +- /get-started/quickstart/postgres-to-fluss +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Streaming ELT 同步 Postgres 到 Fluss + +这篇教程将展示如何基于 Flink CDC 快速构建 PostgreSQL 到 Fluss 的 Streaming ELT 作业,包含整库同步和表结构变更同步的功能。 +本教程的演示都将在 Flink CDC CLI 中进行,无需一行 Java/Scala 代码,也无需安装 IDE。 + +## 准备阶段 +准备一台已经安装了 Docker 的 Linux 或者 MacOS 电脑。 + +### 准备 Flink Standalone 集群 +1. 下载 [Flink 1.20.3](https://archive.apache.org/dist/flink/flink-1.20.3/flink-1.20.3-bin-scala_2.12.tgz),解压后得到 flink-1.20.3 目录。 + 使用下面的命令跳转至 Flink 目录下,并且设置 FLINK_HOME 为 flink-1.20.3 所在目录。 + + ```shell + cd flink-1.20.3 + ``` + +2. 通过在 conf/config.yaml 配置文件追加下列参数开启 checkpoint,每隔 3 秒做一次 checkpoint。 + + ```yaml + execution: + checkpointing: + interval: 3s + ``` + +3. 使用下面的命令启动 Flink 集群。 + + ```shell + ./bin/start-cluster.sh + ``` + +启动成功的话,可以在 [http://localhost:8081/](http://localhost:8081/) 访问到 Flink Web UI。 + +多次执行 `start-cluster.sh` 可以拉起多个 TaskManager。 + +### 准备 Docker 环境 +接下来的教程将以 `docker-compose` 的方式准备所需要的组件。 + +使用下面的内容创建一个 `docker-compose.yml` 文件: + + ```yaml + services: + # Fluss 集群 + coordinator-server: + image: apache/fluss:0.9.0-incubating + command: coordinatorServer + depends_on: + - zookeeper + environment: + - | + FLUSS_PROPERTIES= + zookeeper.address: zookeeper:2181 + bind.listeners: INTERNAL://coordinator-server:0, CLIENT://coordinator-server:9123 + advertised.listeners: CLIENT://localhost:9123 + internal.listener.name: INTERNAL + remote.data.dir: /tmp/fluss/remote-data + security.protocol.map: CLIENT:SASL, INTERNAL:PLAINTEXT + security.sasl.enabled.mechanisms: PLAIN + security.sasl.plain.jaas.config: org.apache.fluss.security.auth.sasl.plain.PlainLoginModule required user_admin="admin-pass" user_developer="developer-pass" ; + super.users: User:admin + ports: + - "9123:9123" + tablet-server: + image: apache/fluss:0.9.0-incubating + command: tabletServer + depends_on: + - coordinator-server + environment: + - | + FLUSS_PROPERTIES= + zookeeper.address: zookeeper:2181 + bind.listeners: INTERNAL://tablet-server:0, CLIENT://tablet-server:9123 + advertised.listeners: CLIENT://localhost:9124 + internal.listener.name: INTERNAL + tablet-server.id: 0 + kv.snapshot.interval: 0s + data.dir: /tmp/fluss/data + remote.data.dir: /tmp/fluss/remote-data + security.protocol.map: CLIENT:SASL, INTERNAL:PLAINTEXT + security.sasl.enabled.mechanisms: PLAIN + security.sasl.plain.jaas.config: org.apache.fluss.security.auth.sasl.plain.PlainLoginModule required user_admin="admin-pass" user_developer="developer-pass" ; + super.users: User:admin + ports: + - "9124:9123" + zookeeper: + restart: always + image: zookeeper:3.9.2 + # PostgreSQL + postgres: + image: postgres:14.5 + environment: + POSTGRES_USER: root + POSTGRES_PASSWORD: password + POSTGRES_DB: postgres + ports: + - "5432:5432" + volumes: + - postgres_data:/var/lib/postgresql/data + command: + - "postgres" + - "-c" + - "wal_level=logical" + - "-c" + - "max_replication_slots=5" + - "-c" + - "max_wal_senders=5" + - "-c" + - "hot_standby=on" + volumes: + postgres_data: + ``` + +该 Docker Compose 中包含的容器有: +- **Fluss**(coordinator-server, tablet-server, zookeeper):目标数据湖仓 +- **PostgreSQL**:源数据库,已开启逻辑复制(`wal_level=logical`) + +在 `docker-compose.yml` 所在目录下执行下面的命令来启动本教程需要的组件: + + ```shell + docker-compose up -d + ``` + +该命令将以 detached 模式自动启动 Docker Compose 配置中定义的所有容器。你可以通过 `docker ps` 来观察上述的容器是否正常启动了。 + +#### 在 PostgreSQL 数据库中准备数据 +1. 连接 PostgreSQL 数据库 + + ```shell + psql -h localhost -p 5432 -U root postgres + ``` + 密码为:`password` + +2. 创建 `adb` 数据库并切换 + + ```sql + CREATE DATABASE adb; + \c adb Review Comment: Is this a mistaken operation? ########## docs/content/docs/get-started/quickstart/postgres-to-fluss.md: ########## @@ -0,0 +1,408 @@ +--- +title: "Postgres to Fluss" +weight: 5 +type: docs +aliases: +- /get-started/quickstart/postgres-to-fluss +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Streaming ELT from Postgres to Fluss + +This tutorial shows how to quickly build a Streaming ELT job from PostgreSQL to Fluss using Flink CDC, including +full-database synchronization and schema change evolution. +All exercises in this tutorial are performed in the Flink CDC CLI, and the entire process uses standard SQL syntax, +without a single line of Java/Scala code or IDE installation. + +## Preparation +Prepare a Linux or MacOS computer with Docker installed. + +### Prepare Flink Standalone cluster +1. Download [Flink 1.20.3](https://archive.apache.org/dist/flink/flink-1.20.3/flink-1.20.3-bin-scala_2.12.tgz), unzip and get flink-1.20.3 directory. + Use the following command to navigate to the Flink directory and set FLINK_HOME to the directory where flink-1.20.3 is located. + + ```shell + cd flink-1.20.3 + ``` + +2. Enable checkpointing by appending the following parameters to the conf/config.yaml configuration file to perform a checkpoint every 3 seconds. + + ```yaml + execution: + checkpointing: + interval: 3s + ``` + +3. Start the Flink cluster using the following command. + + ```shell + ./bin/start-cluster.sh + ``` + +If successfully started, you can access the Flink Web UI at [http://localhost:8081/](http://localhost:8081/). + +Executing `start-cluster.sh` multiple times can start multiple `TaskManager`s. + +### Prepare docker compose +The following tutorial will prepare the required components using `docker-compose`. + +Create a `docker-compose.yml` file using the content provided below: + + ```yaml + services: + # Fluss cluster + coordinator-server: + image: apache/fluss:0.9.0-incubating + command: coordinatorServer + depends_on: + - zookeeper + environment: + - | + FLUSS_PROPERTIES= + zookeeper.address: zookeeper:2181 + bind.listeners: INTERNAL://coordinator-server:0, CLIENT://coordinator-server:9123 + advertised.listeners: CLIENT://localhost:9123 + internal.listener.name: INTERNAL + remote.data.dir: /tmp/fluss/remote-data + security.protocol.map: CLIENT:SASL, INTERNAL:PLAINTEXT + security.sasl.enabled.mechanisms: PLAIN + security.sasl.plain.jaas.config: org.apache.fluss.security.auth.sasl.plain.PlainLoginModule required user_admin="admin-pass" user_developer="developer-pass" ; + super.users: User:admin + ports: + - "9123:9123" + tablet-server: + image: apache/fluss:0.9.0-incubating + command: tabletServer + depends_on: + - coordinator-server + environment: + - | + FLUSS_PROPERTIES= + zookeeper.address: zookeeper:2181 + bind.listeners: INTERNAL://tablet-server:0, CLIENT://tablet-server:9123 + advertised.listeners: CLIENT://localhost:9124 + internal.listener.name: INTERNAL + tablet-server.id: 0 + kv.snapshot.interval: 0s + data.dir: /tmp/fluss/data + remote.data.dir: /tmp/fluss/remote-data + security.protocol.map: CLIENT:SASL, INTERNAL:PLAINTEXT + security.sasl.enabled.mechanisms: PLAIN + security.sasl.plain.jaas.config: org.apache.fluss.security.auth.sasl.plain.PlainLoginModule required user_admin="admin-pass" user_developer="developer-pass" ; + super.users: User:admin + ports: + - "9124:9123" + zookeeper: + restart: always + image: zookeeper:3.9.2 + # PostgreSQL + postgres: + image: postgres:14.5 + environment: + POSTGRES_USER: root + POSTGRES_PASSWORD: password + POSTGRES_DB: postgres + ports: + - "5432:5432" + volumes: + - postgres_data:/var/lib/postgresql/data + command: + - "postgres" + - "-c" + - "wal_level=logical" + - "-c" + - "max_replication_slots=5" + - "-c" + - "max_wal_senders=5" + - "-c" + - "hot_standby=on" + volumes: + postgres_data: + ``` + +The Docker Compose includes the following services: +- **Fluss** (coordinator-server, tablet-server, zookeeper): the target data lakehouse +- **PostgreSQL**: the source database with logical replication enabled (`wal_level=logical`) + +To start all containers, run the following command in the directory that contains the `docker-compose.yml` file. + + ```shell + docker-compose up -d + ``` + +This command automatically starts all the containers defined in the Docker Compose configuration in a detached mode. Run `docker ps` to check whether these containers are running properly. + +#### Prepare records for PostgreSQL +1. Connect to the PostgreSQL database + + ```shell + psql -h localhost -p 5432 -U root postgres + ``` + The password is: `password` + +2. Create the `adb` database and switch to it + + ```sql + CREATE DATABASE adb; + \c adb Review Comment: The same question. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
