[incubator-kyuubi] branch master updated: [KYUUBI #1866][DOCS] Add flink sql engine quick start

yao Sat, 12 Mar 2022 01:07:51 -0800

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-kyuubi.git



The following commit(s) were added to refs/heads/master by this push:
     new 8f7b2c6  [KYUUBI #1866][DOCS] Add flink sql engine quick start
8f7b2c6 is described below

commit 8f7b2c66401034b3238dc40238f42b0afe2de31d
Author: Ada Wong <[email protected]>
AuthorDate: Sat Mar 12 17:07:35 2022 +0800

    [KYUUBI #1866][DOCS] Add flink sql engine quick start
    
    ### _Why are the changes needed?_
    
    Add quick start documents of the Flink SQL Engine.
    
    ### _How was this patch tested?_
    - [ ] Add some test cases that check the changes thoroughly including 
negative and positive cases if possible
    
    - [ ] Add screenshots for manual tests if appropriate
    
    - [x] [Run 
test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests)
 locally before make a pull request
    
    Closes #2106 from deadwind4/KYUUBI-1866-quickstart.
    
    Closes #1866
    
    2533aafd [Ada Wong] remove Yarn section
    6aa4db8a [Ada Wong] compress png
    ff6bff72 [Ada Wong] [KYUUBI #1866][DOCS] Add flink sql engine quick start
    
    Authored-by: Ada Wong <[email protected]>
    Signed-off-by: Kent Yao <[email protected]>
---
 docs/imgs/flink/flink_jobs_page.png | Bin 0 -> 65991 bytes
 docs/quick_start/quick_start.md     | 159 +++++++++++++++++++++++++++++++-----
 2 files changed, 140 insertions(+), 19 deletions(-)

diff --git a/docs/imgs/flink/flink_jobs_page.png 
b/docs/imgs/flink/flink_jobs_page.png
new file mode 100644
index 0000000..aae22db
Binary files /dev/null and b/docs/imgs/flink/flink_jobs_page.png differ
diff --git a/docs/quick_start/quick_start.md b/docs/quick_start/quick_start.md
index b03c034..eeb78b3 100644
--- a/docs/quick_start/quick_start.md
+++ b/docs/quick_start/quick_start.md
@@ -36,49 +36,51 @@ You can get the most recent stable release of Apache Kyuubi 
here:
 ## Requirements
 
 These are essential components required for Kyuubi to startup.
-For quick start deployment, the only thing you need is `JAVA_HOME` and 
`SPARK_HOME` being correctly set.
+For quick start deployment, the only thing you need is `JAVA_HOME` being 
correctly set.
 The Kyuubi release package you downloaded or built contains the rest 
prerequisites inside already.
 
 Components| Role | Optional | Version | Remarks
 --- | --- | --- | --- | ---
 Java | Java<br>Runtime<br>Environment | Required | Java 8/11 | Kyuubi is 
pre-built with Java 8
-Spark | Distributed<br>SQL<br>Engine | Required | 3.0.0 and above | By default 
Kyuubi binary release is delivered without<br> a Spark tarball.
+Spark | Distributed<br>SQL<br>Engine | Optional | 3.0.0 and above | By default 
Kyuubi binary release is delivered without<br> a Spark tarball.
+Flink | Distributed<br>SQL<br>Engine | Optional | 1.14.0 and above | By 
default Kyuubi binary release is delivered without<br> a Flink tarball.
 HDFS | Distributed<br>File<br>System |  Optional | referenced<br>by<br>Spark | 
Hadoop Distributed File System is a <br>part of Hadoop framework, used to<br> 
store and process the datasets.<br> You can interact with any<br> 
Spark-compatible versions of HDFS.
 Hive | Metastore | Optional | referenced<br>by<br>Spark | Hive Metastore for 
Spark SQL to connect
 Zookeeper | Service<br>Discovery | Optional | 
Any<br>zookeeper<br>ensemble<br>compatible<br>with<br>curator(2.12.0) | By 
default, Kyuubi provides a<br> embedded Zookeeper server inside for<br> 
non-production use.
 
-Additionally, if you want to work with other Spark compatible systems or 
plugins, you only need to take care of them as using them with regular Spark 
applications.
-For example, you can run Spark SQL engines created by the Kyuubi on any 
cluster manager, including YARN, Kubernetes, Mesos, e.t.c...
-Or, you can manipulate data from different data sources with the Spark 
Datasource API, e.g. Delta Lake, Apache Hudi, Apache Iceberg, Apache Kudu and 
e.t.c...
+Additionally, if you want to work with other Spark/Flink compatible systems or 
plugins, you only need to take care of them as using them with regular 
Spark/Flink applications.
+For example, you can run Spark/Flink SQL engines created by the Kyuubi on any 
cluster manager, including YARN, Kubernetes, Mesos, e.t.c...
+Or, you can manipulate data from different data sources with the Spark 
Datasource/Flink Table API, e.g. Delta Lake, Apache Hudi, Apache Iceberg, 
Apache Kudu and e.t.c...
 
 ## Installation
 
 To install Kyuubi, you need to unpack the tarball. For example,
 
 ```bash
-tar zxf apache-kyuubi-1.3.1-incubating-bin.tgz
+tar zxf apache-kyuubi-1.5.0-incubating-bin.tgz
 ```
 
-This will result in the creation of a subdirectory named 
`apache-kyuubi-1.3.1-incubating-bin` shown below,
+This will result in the creation of a subdirectory named 
`apache-kyuubi-1.5.0-incubating-bin` shown below,
 
 ```bash
-apache-kyuubi-1.3.1-incubating-bin
+apache-kyuubi-1.5.0-incubating-bin
 ├── DISCLAIMER
 ├── LICENSE
 ├── NOTICE
 ├── RELEASE
+├── beeline-jars
 ├── bin
 ├── conf
 |   ├── kyuubi-defaults.conf.template
 │   ├── kyuubi-env.sh.template
-│   └── log4j.properties.template
+│   └── log4j2.properties.template
 ├── docker
 │   ├── Dockerfile
+│   ├── helm
 │   ├── kyuubi-configmap.yaml
+│   ├── kyuubi-deployment.yaml
 │   ├── kyuubi-pod.yaml
 │   └── kyuubi-service.yaml
-├── extension
-│  └── kyuubi-extension-spark-3-1_2.12-1.3.1-incubating.jar
 ├── externals
 │  └── engines
 ├── jars
@@ -97,7 +99,7 @@ From top to bottom are:
 - bin: the entry of the Kyuubi server with `kyuubi` as the startup script.
 - conf: all the defaults used by Kyuubi Server itself or creating a session 
with Spark applications.
 - externals
-  - engines: contains all kinds of SQL engines that we support, e.g. Apache 
Spark, Apache Flink(coming soon).
+  - engines: contains all kinds of SQL engines that we support, e.g. Apache 
Spark, Apache Flink, Trino(coming soon).
 - licenses: a bunch of licenses included.
 - jars: packages needed by the Kyuubi server.
 - logs: where the logs of the Kyuubi server locates.
@@ -106,7 +108,11 @@ From top to bottom are:
 
 ## Running Kyuubi
 
-As mentioned above, for a quick start deployment, then only you need to be 
sure is that your java runtime environment and `SPARK_HOME` are correct.
+As mentioned above, for a quick start deployment, then only you need to be 
sure is that the below environments are correct:
+ 
+- Java runtime environment 
+- `SPARK_HOME` for the Spark engine
+- `FLINK_HOME` and `kyuubi.engine.type` in 
`$KYUUBI_HOME/conf/kyuubi-defaults.conf` for the Flink engine.
 
 ### Setup JAVA
 
@@ -132,7 +138,9 @@ Java HotSpot(TM) 64-Bit Server VM 18.9 (build 
11.0.5+10-LTS, mixed mode)
 The recommended place to set `JAVA_HOME` is `$KYUUBI_HOME/conf/kyuubi-env.sh`, 
as the ways above are too flaky.
 The `JAVA_HOME` in `$KYUUBI_HOME/conf/kyuubi-env.sh` will take others' 
precedence.
 
-### Setup Spark
+### Spark Engine 
+
+#### Setup Spark
 
 Similar to `JAVA_HOME`, you can also set `SPARK_HOME` in different ways. 
However, we recommend setting it in `$KYUUBI_HOME/conf/kyuubi-env.sh` too.
 
@@ -142,6 +150,26 @@ For example,
 SPARK_HOME=~/Downloads/spark-3.2.0-bin-hadoop3.2
 ```
 
+### Flink Engine
+
+#### Setup Flink
+
+Similar to `JAVA_HOME`, you can also set `FLINK_HOME` in different ways. 
However, we recommend setting it in `$KYUUBI_HOME/conf/kyuubi-env.sh` too.
+
+For example,
+
+```bash
+FLINK_HOME=/Downloads/flink-1.14.3
+```
+
+#### Setup Kyuubi Flink Configration
+
+To enable the Flink SQL engine, the `kyuubi.engine.type` in 
`$KYUUBI_HOME/conf/kyuubi-defaults.conf` need to be set as `FLINK_SQL`.
+
+```bash
+kyuubi.engine.type FLINK_SQL
+```
+
 ### Starting Kyuubi
 
 ```bash
@@ -195,7 +223,7 @@ bin/kyuubi run
 
 ## Using Hive Beeline
 
-Kyuubi server is compatible with Apache Hive beeline, so you can use 
`$SPARK_HOME/bin/beeline` for testing.
+Kyuubi server is compatible with Apache Hive beeline, so you can use 
`$KYUUBI_HOME/bin/beeline` for testing.
 
 ### Opening a Connection
 
@@ -213,7 +241,7 @@ Beeline version 2.3.7 by Apache Hive
 
 In this case, the session will create for the user named 'anonymous'.
 
-Kyuubi will create a Spark SQL engine application using 
`kyuubi-spark-sql-engine_2.12-<version>.jar`.
+Kyuubi will create a Spark/Flink SQL engine application using 
`kyuubi-<engine>-sql-engine_2.12-<version>.jar`.
 It will cost awhile for the application to be ready before fully establishing 
the session.
 Otherwise, an existing application will be reused, and the time cost here is 
negligible.
 
@@ -225,17 +253,28 @@ bin/beeline -u 'jdbc:hive2://localhost:10009/' -n kentyao
 
 The formerly created Spark application for user 'anonymous' will not be reused 
in this case, while a brand new application will be submitted for user 
'kentyao' instead.
 
-Then, you can see 3 processes running in your local environment, including one 
`KyuubiServer` instance and 2 `SparkSubmit` instances as the SQL engines.
+Then, you can see two processes running in your local environment, including 
one `KyuubiServer` instance, one `SparkSubmit` or `FlinkSQLEngine` instances as 
the SQL engines.
+
+- Spark
 
 ```
 75730 Jps
 70843 KyuubiServer
 72566 SparkSubmit
-75356 SparkSubmit
+```
+
+- Flink
+
+```
+43484 Jps
+43194 KyuubiServer
+43260 FlinkSQLEngine
 ```
 
 ### Execute Statements
 
+#### Execute Spark SQL Statements
+
 If the beeline session is successfully connected, then you can run any query 
supported by Spark SQL now. For example,
 
 ```logtalk
@@ -304,6 +343,88 @@ For example, you can get the Spark web UI from the log for 
debugging or tuning.
 
 ![](../imgs/spark_jobs_page.png)
 
+#### Execute Flink SQL Statements
+
+If the beeline session is successfully connected, then you can run any query 
supported by Flink SQL now. For example,
+
+```logtalk
+0: jdbc:hive2://127.0.0.1:10009/default> CREATE TABLE T (
+. . . . . . . . . . . . . . . . . . . . . . >     a INT,
+. . . . . . . . . . . . . . . . . . . . . . >     b VARCHAR(10)
+. . . . . . . . . . . . . . . . . . . . . . > ) WITH (
+. . . . . . . . . . . . . . . . . . . . . . >     'connector.type' = 
'filesystem',
+. . . . . . . . . . . . . . . . . . . . . . >     'connector.path' = 
'file:///tmp/T.csv',
+. . . . . . . . . . . . . . . . . . . . . . >     'format.type' = 'csv',
+. . . . . . . . . . . . . . . . . . . . . . >     'format.derive-schema' = 
'true'
+. . . . . . . . . . . . . . . . . . . . . . > );
+16:28:47.164 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing 
anonymous's query[22a73e39-d9d7-479b-a118-33f9d2a5ad3f]: INITIALIZED_STATE -> 
PENDING_STATE, statement: CREATE TABLE T(
+a INT,
+b VARCHAR(10)
+) WITH (
+'connector.type' = 'filesystem',
+'connector.path' = 'file:///tmp/T.csv',
+'format.type' = 'csv',
+'format.derive-schema' = 'true'
+)
+16:28:47.187 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing 
anonymous's query[22a73e39-d9d7-479b-a118-33f9d2a5ad3f]: PENDING_STATE -> 
RUNNING_STATE, statement: CREATE TABLE T(
+a INT,
+b VARCHAR(10)
+) WITH (
+'connector.type' = 'filesystem',
+'connector.path' = 'file:///tmp/T.csv',
+'format.type' = 'csv',
+'format.derive-schema' = 'true'
+)
+16:28:47.320 INFO org.apache.kyuubi.operation.ExecuteStatement: 
Query[22a73e39-d9d7-479b-a118-33f9d2a5ad3f] in FINISHED_STATE
+16:28:47.322 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing 
anonymous's query[22a73e39-d9d7-479b-a118-33f9d2a5ad3f]: RUNNING_STATE -> 
FINISHED_STATE, statement: CREATE TABLE T(
+a INT,
+b VARCHAR(10)
+) WITH (
+'connector.type' = 'filesystem',
+'connector.path' = 'file:///tmp/T.csv',
+'format.type' = 'csv',
+'format.derive-schema' = 'true'
+), time taken: 0.134 seconds
++---------+
+| result  |
++---------+
+| OK      |
++---------+
+1 row selected (0.341 seconds)
+0: jdbc:hive2://127.0.0.1:10009/default> INSERT INTO T VALUES (1, 'Hi'), (2, 
'Hello');
+16:28:52.780 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing 
anonymous's query[d79abf78-d2ae-468f-87b2-19db1fc6e19a]: INITIALIZED_STATE -> 
PENDING_STATE, statement: INSERT INTO T VALUES (1, 'Hi'), (2, 'Hello')
+16:28:52.786 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing 
anonymous's query[d79abf78-d2ae-468f-87b2-19db1fc6e19a]: PENDING_STATE -> 
RUNNING_STATE, statement: INSERT INTO T VALUES (1, 'Hi'), (2, 'Hello')
+16:28:57.827 INFO org.apache.kyuubi.operation.ExecuteStatement: 
Query[d79abf78-d2ae-468f-87b2-19db1fc6e19a] in RUNNING_STATE
+16:28:59.836 INFO org.apache.kyuubi.operation.ExecuteStatement: 
Query[d79abf78-d2ae-468f-87b2-19db1fc6e19a] in FINISHED_STATE
+16:28:59.837 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing 
anonymous's query[d79abf78-d2ae-468f-87b2-19db1fc6e19a]: RUNNING_STATE -> 
FINISHED_STATE, statement: INSERT INTO T VALUES (1, 'Hi'), (2, 'Hello'), time 
taken: 7.05 seconds
++-------------------------------------+
+| default_catalog.default_database.T  |
++-------------------------------------+
+| -1                                  |
++-------------------------------------+
+1 row selected (7.104 seconds)
+0: jdbc:hive2://127.0.0.1:10009/default>
+0: jdbc:hive2://127.0.0.1:10009/default> SELECT * FROM T;
+16:29:08.092 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing 
anonymous's query[af5660c0-fcc4-4f80-b3fd-c4a799faf33f]: INITIALIZED_STATE -> 
PENDING_STATE, statement: SELECT * FROM T
+16:29:08.101 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing 
anonymous's query[af5660c0-fcc4-4f80-b3fd-c4a799faf33f]: PENDING_STATE -> 
RUNNING_STATE, statement: SELECT * FROM T
+16:29:12.519 INFO org.apache.kyuubi.operation.ExecuteStatement: 
Query[af5660c0-fcc4-4f80-b3fd-c4a799faf33f] in FINISHED_STATE
+16:29:12.520 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing 
anonymous's query[af5660c0-fcc4-4f80-b3fd-c4a799faf33f]: RUNNING_STATE -> 
FINISHED_STATE, statement: SELECT * FROM T, time taken: 4.419 seconds
++----+--------+
+| a  |   b    |
++----+--------+
+| 1  | Hi     |
+| 2  | Hello  |
++----+--------+
+2 rows selected (4.466 seconds)
+```
+
+As shown in the above case, you can retrieve all the operation logs, the 
result schema, and the result to your client-side in the beeline console.
+
+Additionally, some useful information about the background Flink SQL 
application associated with this connection is also printed in the operation 
log.
+For example, you can get the Flink web UI from the log for debugging or tuning.
+
+![](../imgs/flink/flink_jobs_page.png)
+
 ### Closing a Connection
 
 Close the session between beeline and Kyuubi server by executing `!quit`, for 
example,
@@ -339,4 +460,4 @@ Bye!
 
 The `KyuubiServer` instance will be stopped immediately while the SQL engine's 
application will still be alive for a while.
 
-If you start Kyuubi again before the SQL engine application terminates itself, 
it will reconnect to the newly created `KyuubiServer` instance.
+If you start Kyuubi again before the SQL engine application terminates itself, 
it will reconnect to the newly created `KyuubiServer` instance.
\ No newline at end of file

[incubator-kyuubi] branch master updated: [KYUUBI #1866][DOCS] Add flink sql engine quick start

Reply via email to