This is an automated email from the ASF dual-hosted git repository.
ulyssesyou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-kyuubi.git
The following commit(s) were added to refs/heads/master by this push:
new bf9736e [KYUUBI #1399] [DOCS] Add doc for engine share level
bf9736e is described below
commit bf9736e31b3a950fd7166635a9edd3908e12b31e
Author: Kent Yao <[email protected]>
AuthorDate: Tue Nov 16 19:58:25 2021 +0800
[KYUUBI #1399] [DOCS] Add doc for engine share level
<!--
Thanks for sending a pull request!
Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://kyuubi.readthedocs.io/en/latest/community/contributions.html
2. If the PR is related to an issue in
https://github.com/apache/incubator-kyuubi/issues, add '[KYUUBI #XXXX]' in your
PR title, e.g., '[KYUUBI #XXXX] Your PR title ...'.
3. If the PR is unfinished, add '[WIP]' in your PR title, e.g.,
'[WIP][KYUUBI #XXXX] Your PR title ...'.
-->
### _Why are the changes needed?_
<!--
Please clarify why the changes are needed. For instance,
1. If you add a feature, you can talk about the use case of it.
2. If you fix a bug, you can clarify why it is a bug.
-->
doc improvement
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including
negative and positive cases if possible
- [x] Add screenshots for manual tests if appropriate

---

- [x] [Run
test](https://kyuubi.readthedocs.io/en/latest/develop_tools/testing.html#running-tests)
locally before make a pull request
Closes #1399 from yaooqinn/sldoc.
Closes #1399
d8bcce8a [Kent Yao] [DOCS] Add doc for engine share level
21e2548f [Kent Yao] [DOCS] Add doc for engine share level
bba8dc9d [Kent Yao] [DOCS] Add doc for engine share level
Authored-by: Kent Yao <[email protected]>
Signed-off-by: ulysses-you <[email protected]>
---
docs/deployment/engine_share_level.md | 168 +++++++++++++++++++++++++
docs/deployment/index.rst | 12 +-
docs/imgs/engine_share_level_connection.drawio | 1 +
docs/imgs/engine_share_level_group.drawio | 1 +
docs/imgs/engine_share_level_server.drawio | 1 +
docs/imgs/engine_share_level_user.drawio | 1 +
docs/overview/kyuubi_vs_thriftserver.md | 2 +-
7 files changed, 184 insertions(+), 2 deletions(-)
diff --git a/docs/deployment/engine_share_level.md
b/docs/deployment/engine_share_level.md
new file mode 100644
index 0000000..be5f315
--- /dev/null
+++ b/docs/deployment/engine_share_level.md
@@ -0,0 +1,168 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements. See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License. You may obtain a copy of the License at
+ -
+ - http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+<div align=center>
+
+
+
+</div>
+
+# The Share Level Of Kyuubi Engines
+
+The share level of Kyuubi engines describes the relationship between sessions
and engines.
+It determines whether a new session can share an existing backend engine with
other sessions or not.
+The sessions are also known as JDBC/ODBC/Thrift connections from clients that
end-users create, and the engines are standalone applications with the full
capabilities of Spark SQL, Flink SQL(under dev), running on single-node
machines or clusters.
+
+The share level of Kyuubi engines works the same whether in HA or single node
mode.
+In other words, an engine is cluster widely shared by all Kyuubi server peers
if could.
+
+## Why do we need this feature?
+
+Apache Spark is a unified engine for large-scale data analytics.
+Using Spark to process data is like driving an all-wheel-drive hefty
horsepower supercar.
+However,
+
+- Cars have their limit of 0-60 times.
+In a similar way, all Spark applications also have to warm up before go full
speed.
+- Cars have a constant number of seats and are not allowed to be overloaded.
+Due to the master-slave architecture of Spark and the resource configured
ahead, the overall workload of a single application is predictable.
+- Cars have various shapes to meet our needs.
+
+With this feature, Kyuubi give you a more flexible way to handle different big
data workloads.
+
+## The current supported share levels
+
+The current supported share levels are,
+
+| Share Level | Syntax | Scenario | Isolation Degree | Sharability |
+| --- | --- | ---- | --- | --- |
+| **CONNECTION** | One engine per session | Large-scale ETL </br> Ad hoc |
High | Low |
+| **USER** | One engine per user | Ad hoc </br> Small-scale ETL | Medium |
Medium|
+| **GROUP** | One engine per primary group | Ad hoc </br> Small-scale ETL |
Low | High |
+| **SERVER**| One engine per cluster | Admin | Highest If Secured </br> Lowest
If Unsecured | Admin ONLY If Secured |
+
+- Better isolation degree of engines gives us better stability of an engine
and the query executions running on it.
+- Better sharability of engines means we are more likely to reuse an engine
which is already in full speed.
+
+### CONNECTION
+
+<body><div class="mxgraph" style="max-width:100%;border:1px solid
transparent;"
data-mxgraph="{"nav":true,"resize":true,"toolbar":"zoom
layers tags
lightbox","edit":"_blank","xml":"<mxfile
host=\"Electron\" modified=\"2021-11-15T06:45:25.722Z\"
agent=\"5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML,
like Gecko) draw.io/15.4.0 Chrome/91.0.4472.164 Electron/13.5.0 Safar [...]
+<script type="text/javascript"
src="https://viewer.diagrams.net/js/viewer-static.min.js"></script>
+</body>
+<div align=center>
+
+*Figure.1 CONNECTION Share Level*
+
+</div>
+
+Each session with CONNECTION share level has a standalone engine for itself
which is unreachable for anyone else.
+Within the session, a user or client can send multiple operation request,
including metadata calls or queries, to the corresponding engine.
+
+Although it is still an interactive form, this model does allow for more
practical batch processing jobs as well.
+
+When closing session, the corresponding engine will be shutdown at the same
time.
+
+### USER(Default)
+
+<body><div class="mxgraph" style="max-width:100%;border:1px solid
transparent;"
data-mxgraph="{"nav":true,"resize":true,"toolbar":"zoom
layers tags
lightbox","edit":"_blank","xml":"<mxfile
host=\"Electron\" modified=\"2021-11-15T06:49:50.020Z\"
agent=\"5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML,
like Gecko) draw.io/15.4.0 Chrome/91.0.4472.164 Electron/13.5.0 Safar [...]
+<script type="text/javascript"
src="https://viewer.diagrams.net/js/viewer-static.min.js"></script>
+</body>
+<div align=center>
+
+*Figure.2 USER Share Level*
+</div>
+
+All sessions with USER share level use the same engine if and only if the
session user is the same.
+
+Those sessions share the same engine with objects belong to the one and only
`SparkContext` instance, including `Classes/Classloaders`, `SparkConf`,
`Driver`/`Executor`s, `Hive Metastore Client`, etc.
+But each session can still have its own `SparkSession` instance, which
contains separate session state, including temporary views, SQL config, UDFs
etc.
+Setting `kyuubi.engine.single.spark.session` to true will make `SparkSession`
instance a singleton and share across sessions.
+
+When closing session, the corresponding engine will not be shutdown.
+When all sessions are closed, the corresponding engine still has a
time-to-live lifespan.
+This TTL allows new sessions to be established quickly without waiting for the
engine to start.
+
+### GROUP
+
+<body><div class="mxgraph" style="max-width:100%;border:1px solid
transparent;"
data-mxgraph="{"nav":true,"resize":true,"toolbar":"zoom
layers tags
lightbox","edit":"_blank","xml":"<mxfile
host=\"Electron\" modified=\"2021-11-15T06:39:03.927Z\"
agent=\"5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML,
like Gecko) draw.io/15.4.0 Chrome/91.0.4472.164 Electron/13.5.0 Safar [...]
+<script type="text/javascript"
src="https://viewer.diagrams.net/js/viewer-static.min.js"></script>
+</body>
+<div align=center>
+
+*Figure.3 GROUP Share Level*
+
+</div>
+
+
+An engine will be shared by all sessions created by all users belong to the
same primary group name.
+The engine will be launched by the group name as the effective username, so
here the group name is kind of special user who is able to visit the compute
resources/data of a team.
+It follows the [Hadoop
GroupsMapping](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/GroupsMapping.html)
to map user to a primary group. If the primary group is not found, it falls
back to the USER level.
+
+The mechanisms of `SparkContext`, `SparkSession` and TTL works similarly to
USER share level.
+
+**Tips for authorization in GROUP share level**:
+
+The session user and the primary group name(as sparkUser/execute user) will be
both accessible at engine-side.
+By default, the sparkUser will be used to check the YARN/HDFS ACLs.
+If you want fine-grained access control for session user, you need to get it
from `SparkContext.getLocalProperty("kyuubi.session.user")` and send it to
security service, like Apache Ranger.
+
+### SERVER
+
+<body><div class="mxgraph" style="max-width:100%;border:1px solid
transparent;"
data-mxgraph="{"nav":true,"resize":true,"toolbar":"zoom
layers tags
lightbox","edit":"_blank","xml":"<mxfile
host=\"Electron\" modified=\"2021-11-15T07:07:11.985Z\"
agent=\"5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML,
like Gecko) draw.io/15.4.0 Chrome/91.0.4472.164 Electron/13.5.0 Safar [...]
+<script type="text/javascript"
src="https://viewer.diagrams.net/js/viewer-static.min.js"></script>
+</body>
+<div align=center>
+
+*Figure.4 SERVER Share Level*
+
+</div>
+
+Literally, this model is similar to Spark Thrift Server with High availability.
+
+### Subdomain
+
+For USER, GROUP, or SERVER share levels, you can further use
`kyuubi.engine.share.level.subdomain` to isolate the engine.
+That is, you can also create multiple engines for a single user, group or
server(cluster).
+For example, in USER share level, you can use
`kyuubi.engine.share.level.subdomain=sd1` and
`kyuubi.engine.share.level.subdomain=sd2` to create two standalone engines for
user `Tom`.
+
+The `kyuubi.engine.share.level.subdomain` shall be configured in the JDBC
connection URL to tell the Kyuubi server which engine you want to use.
+
+### Hybrid
+
+All supported share levels can be used together in a single Kyuubi server or
cluster.
+
+## Related Configurations
+
+- kyuubi.engine.share.level(kyuubi.session.engine.share.level)
+ - Default: USER
+ - Candidates: USER, CONNECTION, GROUP, SERVER
+ - Meaning: The base level for how an engine is created, cached and shared to
sessions.
+ - Usage: It can be set both in the server configuration file and also
connection URL. The latter has higher priority.
+- kyuubi.session.engine.idle.timeout
+ - Default: PT30M (30 min)
+ - Candidates: a proper timeout
+ - Meaning: Time to live since engine becomes idle
+ - Usage: It can be set both in the server configuration file and also
connection URL. The latter has higher priority.
+- kyuubi.engine.share.level.subdomain(kyuubi.engine.share.level.sub.domain)
+ - Default: <none>
+ - Candidates: a valid zookeeper a child node
+ - Meaning: Add a subdomain under the base level to make further isolation
for engines
+ - Usage: It can be set both in the server configuration file and also
connection URL. The latter has higher priority.
+
+## Conclusion
+
+With This feature, end-users are able to leverage engines in different ways to
handle their different workloads, such as large-scale ETL jobs and interactive
ad hoc queries.
diff --git a/docs/deployment/index.rst b/docs/deployment/index.rst
index 12ebc16..425388c 100644
--- a/docs/deployment/index.rst
+++ b/docs/deployment/index.rst
@@ -46,4 +46,14 @@ Configurations
:glob:
settings
- spark/index
\ No newline at end of file
+ spark/index
+
+Engines
+-------
+
+.. toctree::
+ :maxdepth: 2
+ :numbered: 3
+ :glob:
+
+ engine_share_level
diff --git a/docs/imgs/engine_share_level_connection.drawio
b/docs/imgs/engine_share_level_connection.drawio
new file mode 100644
index 0000000..81f0a73
--- /dev/null
+++ b/docs/imgs/engine_share_level_connection.drawio
@@ -0,0 +1 @@
+<mxfile host="Electron" modified="2021-11-15T05:59:36.083Z" agent="5.0
(Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko)
draw.io/15.4.0 Chrome/91.0.4472.164 Electron/13.5.0 Safari/537.36"
etag="gS6CfquJewT754n2nFC3" version="15.4.0" type="device"><diagram
id="RaWIMvdSY7jaPepMtQGd" name="第 1
页">7X1Zl6tG1uWv8VrdD+3FPDwyCxDzKF56MYl5BgH69Q2ZeX1Hl+0qu6o+t9K+KREEQRCxzz7nhGKnfoKZZhPGsM+VLknrnyAg2X6C2Z8gCAJQ6Hg5S/b3EhBCwfeSbCySj7LPBVbxTD8KgY/SpUjS6auKc9fVc9F/XRh3bZvG81dl
[...]
\ No newline at end of file
diff --git a/docs/imgs/engine_share_level_group.drawio
b/docs/imgs/engine_share_level_group.drawio
new file mode 100644
index 0000000..5776723
--- /dev/null
+++ b/docs/imgs/engine_share_level_group.drawio
@@ -0,0 +1 @@
+<mxfile host="Electron" modified="2021-11-15T06:35:49.668Z" agent="5.0
(Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko)
draw.io/15.4.0 Chrome/91.0.4472.164 Electron/13.5.0 Safari/537.36"
etag="hxA853FhsysEn5uKtU4c" version="15.4.0" type="device"><diagram
id="RaWIMvdSY7jaPepMtQGd" name="第 1
页">7X3ZlqPGtu3XeIx7HrYHffNIL0D0jRAvd9CJvgcB+voL2ZQrK7Ps8j4ue99zUnalRABBEDHXXA2xiF9gpl6FIegypY2T6hcIiNdfYPYXCAIBEtq/jpLtuYQE4OeCdMjjl4N+K7DyR/J65kvpnMfJ+ObAqW2rKe/eFkZt0yTR9KYs
[...]
\ No newline at end of file
diff --git a/docs/imgs/engine_share_level_server.drawio
b/docs/imgs/engine_share_level_server.drawio
new file mode 100644
index 0000000..ce8db6c
--- /dev/null
+++ b/docs/imgs/engine_share_level_server.drawio
@@ -0,0 +1 @@
+<mxfile host="Electron" modified="2021-11-15T07:06:57.521Z" agent="5.0
(Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko)
draw.io/15.4.0 Chrome/91.0.4472.164 Electron/13.5.0 Safari/537.36"
etag="Ceo0nGD68qSD25QeHOZc" version="15.4.0" type="device"><diagram
id="RaWIMvdSY7jaPepMtQGd" name="第 1
页">7XxXl6RGtu6v0VrnPowW3jziE0i85+UuXOI9JJC//kJVdaudRqO50mjOOVPdVQlBEIT59rdNxuYnmGl3YYqGQunTrPkJAtL9J5j9CYJAgITOj6vkeC8hAfi9IJ/K9KPSLwVW+co+3flRupZpNn9Vcen7ZimHrwuTvuuyZPmqLJqm
[...]
\ No newline at end of file
diff --git a/docs/imgs/engine_share_level_user.drawio
b/docs/imgs/engine_share_level_user.drawio
new file mode 100644
index 0000000..8eaf7cf
--- /dev/null
+++ b/docs/imgs/engine_share_level_user.drawio
@@ -0,0 +1 @@
+<mxfile host="Electron" modified="2021-11-15T07:05:10.317Z" agent="5.0
(Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko)
draw.io/15.4.0 Chrome/91.0.4472.164 Electron/13.5.0 Safari/537.36"
etag="iuSpQNHr9bvtRN-et5D2" version="15.4.0" type="device"><diagram
id="RaWIMvdSY7jaPepMtQGd" name="第 1
页">7XzXlqRKku3X1FozD30WWjyiAwi0DF7uQgVaQwDx9Rcys+pUVtUR3XNET09mVWaA4ziO+7ZtZh5u9glmmk0Ywz5XuiStP0FAsn2C2U8QBAIkdHycJftrCQnArwXZWCRvlX4usIpn+vnOt9KlSNLpXcW56+q56N8Xxl3bpvH8riwc
[...]
\ No newline at end of file
diff --git a/docs/overview/kyuubi_vs_thriftserver.md
b/docs/overview/kyuubi_vs_thriftserver.md
index d50e5e3..575e528 100644
--- a/docs/overview/kyuubi_vs_thriftserver.md
+++ b/docs/overview/kyuubi_vs_thriftserver.md
@@ -226,7 +226,7 @@ Inside an Engine, the Engine's user, a.k.a. `Spark User`,
will also be the same.
When an Engine runs queries received from the JDBC connection, the Engine's
user must also have rights to access the data.
Besides, if it needs access to metadata during this process, then we can also
add a fine-grained SQL standard ACL management on the metadata layer now with
[Submarine Spark Security
Plugin](https://submarine.apache.org/docs/userDocs/submarine-security/spark-security/README).
-The Engines have their lifecycle, which is related to the
`kyuubi.session.engine.share.level` specified via client configurations.
+The Engines have their lifecycle, which is related to the
`kyuubi.engine.share.level` specified via client configurations.
For example, if set to `CONNECTION`, then the corresponding Engine will be
created for each JDBC connection and terminates itself when we close the
connection.
For another example, if set to `USER`, the corresponding Engine is cached and
shared with all JDBC connections from the same user, even through different
Kyuubi servers in HA mode.
The Engine will eventually timeout after all the sessions are closed.