This is an automated email from the ASF dual-hosted git repository.
renqs pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/flink-cdc.git
The following commit(s) were added to refs/heads/master by this push:
new 32c8e0bc2 [FLINK-34686][cdc] add docs about yarn deployment (#3163)
32c8e0bc2 is described below
commit 32c8e0bc2a2a8b3f6ea73d2f86becbec8f7522a9
Author: Shawn Huang <[email protected]>
AuthorDate: Tue Mar 19 16:17:32 2024 +0800
[FLINK-34686][cdc] add docs about yarn deployment (#3163)
---
docs/content/docs/deployment/yarn.md | 128 +++++++++++++++++++++++++++++++++++
1 file changed, 128 insertions(+)
diff --git a/docs/content/docs/deployment/yarn.md
b/docs/content/docs/deployment/yarn.md
index d0621273a..ec9700d89 100644
--- a/docs/content/docs/deployment/yarn.md
+++ b/docs/content/docs/deployment/yarn.md
@@ -23,3 +23,131 @@ KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
+
+# Introduction
+
+[Apache Hadoop
YARN](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html)
is a resource provider popular with many data processing frameworks.
+Flink services are submitted to YARN's ResourceManager, which spawns
containers on machines managed by YARN NodeManagers. Flink deploys its
JobManager and TaskManager instances into such containers.
+
+Flink can dynamically allocate and de-allocate TaskManager resources depending
on the number of processing slots required by the job(s) running on the
JobManager.
+
+## Preparation
+
+This *Getting Started* section assumes a functional YARN environment, starting
from version 2.10.2. YARN environments are provided most conveniently through
services such as Amazon EMR, Google Cloud DataProc or products like Cloudera.
[Manually setting up a YARN environment
locally](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html)
or [on a
cluster](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html)
is no [...]
+
+- Make sure your YARN cluster is ready for accepting Flink applications by
running `yarn top`. It should show no error messages.
+- Download a recent Flink distribution from the [download
page](https://flink.apache.org/downloads/) and unpack it.
+- **Important** Make sure that the `HADOOP_CLASSPATH` environment variable is
set up (it can be checked by running `echo $HADOOP_CLASSPATH`). If not, set it
up using
+
+```bash
+export HADOOP_CLASSPATH=`hadoop classpath`
+```
+
+## Session Mode
+
+Flink runs on all UNIX-like environments, i.e. Linux, Mac OS X, and Cygwin
(for Windows).
+You can refer [overview]({{< ref "docs/connectors/overview" >}}) to check
supported versions and download [the binary
release](https://flink.apache.org/downloads/) of Flink,
+then extract the archive:
+
+```bash
+tar -xzf flink-*.tgz
+```
+
+You should set `FLINK_HOME` environment variables like:
+
+```bash
+export FLINK_HOME=/path/flink-*
+```
+
+### Starting a Flink Session on YARN
+
+Once you've made sure that the `HADOOP_CLASSPATH` environment variable is set,
you can launch a Flink on YARN session:
+
+```bash
+# we assume to be in the root directory of
+# the unzipped Flink distribution
+
+# export HADOOP_CLASSPATH
+export HADOOP_CLASSPATH=`hadoop classpath`
+
+# Start YARN session
+./bin/yarn-session.sh --detached
+
+# Stop YARN session (replace the application id based
+# on the output of the yarn-session.sh command)
+echo "stop" | ./bin/yarn-session.sh -id application_XXXXX_XXX
+```
+
+After starting YARN session, you can now access the Flink Web UI through the
URL printed in the last lines of the command output, or through the YARN
ResourceManager web UI.
+
+Then, you need to add some configs to your flink-conf.yaml:
+
+```yaml
+rest.bind-port: {{REST_PORT}}
+rest.address: {{NODE_IP}}
+execution.target: yarn-session
+yarn.application.id: {{YARN_APPLICATION_ID}}
+```
+
+{{REST_PORT}} and {{NODE_IP}} should be replaced by the actual values of your
JobManager Web Interface, and {{YARN_APPLICATION_ID}} should be replaced by the
actual YARN application ID of Flink.
+
+### Set up Flink CDC
+Download the tar file of Flink CDC from [release
page](https://github.com/apache/flink-cdc/releases), then extract the archive:
+
+```bash
+tar -xzf flink-cdc-*.tar.gz
+```
+
+Extracted `flink-cdc` contains four directories: `bin`,`lib`,`log` and `conf`.
+
+Download the connector jars from [release
page](https://github.com/apache/flink-cdc/releases), and move it to the `lib`
directory.
+Download links are available only for stable releases, SNAPSHOT dependencies
need to be built based on specific branch by yourself.
+
+### Submit a Flink CDC Job
+Here is an example file for synchronizing the entire database
`mysql-to-doris.yaml`:
+
+```yaml
+################################################################################
+# Description: Sync MySQL all tables to Doris
+################################################################################
+source:
+ type: mysql
+ hostname: localhost
+ port: 3306
+ username: root
+ password: 123456
+ tables: app_db.\.*
+ server-id: 5400-5404
+ server-time-zone: UTC
+
+sink:
+ type: doris
+ fenodes: 127.0.0.1:8030
+ username: root
+ password: ""
+
+pipeline:
+ name: Sync MySQL Database to Doris
+ parallelism: 2
+
+```
+
+You need to modify the configuration file according to your needs.
+Finally, submit job to Flink Standalone cluster using Cli.
+
+```bash
+cd /path/flink-cdc-*
+./bin/flink-cdc.sh mysql-to-doris.yaml
+```
+
+After successful submission, the return information is as follows:
+
+```bash
+Pipeline has been submitted to cluster.
+Job ID: ae30f4580f1918bebf16752d4963dc54
+Job Description: Sync MySQL Database to Doris
+```
+
+You can find a job named `Sync MySQL Database to Doris` running through Flink
Web UI.
+
+Please note that submitting to application mode cluster and per-job mode
cluster are not supported for now.