This is an automated email from the ASF dual-hosted git repository.
yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-kyuubi.git
The following commit(s) were added to refs/heads/master by this push:
new a83cd49 [KYUUBI #1866][FOLLOWUP] Add Deploy Kyuubi Flink engine on
Yarn
a83cd49 is described below
commit a83cd49e1b89d5d5b7f25bee6429f0409b9f4b1e
Author: SteNicholas <[email protected]>
AuthorDate: Mon Mar 14 14:45:44 2022 +0800
[KYUUBI #1866][FOLLOWUP] Add Deploy Kyuubi Flink engine on Yarn
### _Why are the changes needed?_
Add `Deploy Kyuubi Flink engine on Yarn`.
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including
negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [x] [Run
test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests)
locally before make a pull request
Closes #2131 from SteNicholas/KYUUBI-1866.
Closes #1866
ba639f15 [SteNicholas] [KYUUBI #1866][FOLLOWUP] Add Deploy Kyuubi Flink
engine on Yarn
cc6f4d44 [SteNicholas] [KYUUBI #1866][FOLLOWUP] Add Deploy Kyuubi Flink
engine on Yarn
Authored-by: SteNicholas <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
---
docs/deployment/engine_on_yarn.md | 80 ++++++++++++++++++++++++++++++++-------
1 file changed, 66 insertions(+), 14 deletions(-)
diff --git a/docs/deployment/engine_on_yarn.md
b/docs/deployment/engine_on_yarn.md
index 000bc26..79d7b0f 100644
--- a/docs/deployment/engine_on_yarn.md
+++ b/docs/deployment/engine_on_yarn.md
@@ -23,7 +23,9 @@
# Deploy Kyuubi engines on Yarn
-## Requirements
+## Deploy Kyuubi Spark Engine on Yarn
+
+### Requirements
When you want to deploy Kyuubi's Spark SQL engines on YARN, you'd better have
cognition upon the following things.
@@ -36,10 +38,9 @@ When you want to deploy Kyuubi's Spark SQL engines on YARN,
you'd better have co
- An active Apache Hadoop HDFS cluster
- Setup Hadoop client configurations at the machine the Kyuubi server locates
+### Configurations
-## Configurations
-
-### Environment
+#### Environment
Either `HADOOP_CONF_DIR` or `YARN_CONF_DIR` is configured and points to the
Hadoop client configurations directory, usually, `$HADOOP_HOME/etc/hadoop`.
@@ -59,7 +60,7 @@ If the `SparkPi` passes, configure it in
`$KYUUBI_HOME/conf/kyuubi-env.sh` or `$
$ echo "export HADOOP_CONF_DIR=/path/to/hadoop/conf" >>
$KYUUBI_HOME/conf/kyuubi-env.sh
```
-### Spark Properties
+#### Spark Properties
These properties are defined by Spark and Kyuubi will pass them to
`spark-submit` to create Spark applications.
@@ -71,16 +72,16 @@ These properties are defined by Spark and Kyuubi will pass
them to `spark-submit
**Note:** The priority goes down from top to bottom.
-#### Master
+##### Master
Setting `spark.master=yarn` tells Kyuubi to submit Spark SQL engine
applications to the YARN cluster manager.
-#### Queue
+##### Queue
Set `spark.yarn.queue=thequeue` in the JDBC connection string to tell Kyuubi
to use the QUEUE in the YARN cluster, otherwise,
the QUEUE configured at Kyuubi server side will be used as default.
-#### Sizing
+##### Sizing
Pass the configurations below through the JDBC connection string to set how
many instances of Spark executor will be used
and how many cpus and memory will Spark driver, ApplicationMaster and each
executor take.
@@ -101,21 +102,72 @@ since the SQL engine will be long-running for a period,
execute user's queries f
and the demand for computing resources is not the same for those queries.
It is better for Spark to release some executors when either the query is
lightweight, or the SQL engine is being idled.
-
-#### Tuning
+##### Tuning
You can specify `spark.yarn.archive` or `spark.yarn.jars` to point to a
world-readable location that contains Spark jars on HDFS,
which allows YARN to cache it on nodes so that it doesn't need to be
distributed each time an application runs.
-#### Others
+##### Others
Please refer to [Spark
properties](http://spark.apache.org/docs/latest/running-on-yarn.html#spark-properties)
to check other acceptable configs.
-
-## Kerberos
+### Kerberos
Kyuubi currently does not support Spark's [YARN-specific Kerberos
Configuration](http://spark.apache.org/docs/3.0.1/running-on-yarn.html#kerberos),
so `spark.kerberos.keytab` and `spark.kerberos.principal` should not use now.
Instead, you can schedule a periodically `kinit` process via `crontab` task on
the local machine that hosts Kyuubi server or simply use [Kyuubi
Kinit](settings.html#kinit).
-
\ No newline at end of file
+
+ ## Deploy Kyuubi Flink Engine on Yarn
+
+ ### Requirements
+
+ When you want to deploy Kyuubi's Flink SQL engines on YARN, you'd better have
cognition upon the following things.
+
+ - Knowing the basics about [Running Flink on
YARN](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/yarn)
+ - A binary distribution of Flink which is built with YARN support
+ - Download a recent Flink distribution from the [Flink official
website](https://flink.apache.org/downloads.html) and unpack it
+ - An active [Apache Hadoop
YARN](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html)
cluster
+ - Make sure your YARN cluster is ready for accepting Flink applications by
running yarn top. It should show no error messages
+ - An active Apache Hadoop HDFS cluster
+ - Setup Hadoop client configurations at the machine the Kyuubi server locates
+
+ ### Configurations
+
+ #### Environment
+
+ Either `HADOOP_CONF_DIR` or `YARN_CONF_DIR` is configured and points to the
Hadoop client configurations directory, usually, `$HADOOP_HOME/etc/hadoop`.
+
+ If the `HADOOP_CONF_DIR` points the YARN and HDFS cluster correctly, and the
`HADOOP_CLASSPATH` environment variable is set, you can launch a Flink on YARN
session, and submit an example job:
+ ```bash
+# we assume to be in the root directory of
+# the unzipped Flink distribution
+
+# (0) export HADOOP_CLASSPATH
+export HADOOP_CLASSPATH=`hadoop classpath`
+
+# (1) Start YARN Session
+./bin/yarn-session.sh --detached
+
+# (2) You can now access the Flink Web Interface through the
+# URL printed in the last lines of the command output, or through
+# the YARN ResourceManager web UI.
+
+# (3) Submit example job
+./bin/flink run ./examples/streaming/TopSpeedWindowing.jar
+
+# (4) Stop YARN session (replace the application id based
+# on the output of the yarn-session.sh command)
+echo "stop" | ./bin/yarn-session.sh -id application_XXXXX_XXX
+ ```
+
+ If the `TopSpeedWindowing` passes, configure it in
`$KYUUBI_HOME/conf/kyuubi-env.sh` or `$FLINK_HOME/bin/config.sh`, e.g.
+
+ ```bash
+ $ echo "export HADOOP_CONF_DIR=/path/to/hadoop/conf" >>
$KYUUBI_HOME/conf/kyuubi-env.sh
+ ```
+
+#### Deployment Modes Supported by Flink on YARN
+
+For experiment use, we recommend deploying Kyuubi Flink SQL engine in [Session
Mode](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/yarn/#session-mode).
+At present, [Application
Mode](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/yarn/#application-mode)
and [Per-Job Mode
(deprecated)](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/yarn/#per-job-mode-deprecated)
are not supported for Flink engine.