This is an automated email from the ASF dual-hosted git repository.
ulyssesyou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-kyuubi.git
The following commit(s) were added to refs/heads/master by this push:
new 5b49061 [KYUUBI #815] [DOC] [KUBERNETES] Doc for spark-block-cleaner
5b49061 is described below
commit 5b49061d868000eb9a0517a6cfd8b27161d75db4
Author: Binjie Yang <[email protected]>
AuthorDate: Fri Jul 16 22:21:39 2021 +0800
[KYUUBI #815] [DOC] [KUBERNETES] Doc for spark-block-cleaner
### _Why are the changes needed?_
Add Docs for kyuubi tools spark-block-cleaner.
* Explain the parameters
* Introduction to basic startup
* Give an example
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including
negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [X] [Run
test](https://kyuubi.readthedocs.io/en/latest/tools/testing.html#running-tests)
locally before make a pull request
Closes #815 from zwangsheng/doc/spark_block_cleaner.
Closes #815
1ec6795f [Binjie Yang] delete todo
bbf4d6e2 [Binjie Yang] make it common
9cf3e159 [Binjie Yang] format
0803995a [Binjie Yang] straighten out the article
f834b382 [Binjie Yang] refactor
25be318f [Binjie Yang] fix
7304e595 [Binjie Yang] docs for spark-block-cleaner
Authored-by: Binjie Yang <[email protected]>
Signed-off-by: ulysses-you <[email protected]>
---
docs/{tools => develop_tools}/build_document.md | 0
docs/{tools => develop_tools}/building.md | 0
docs/{tools => develop_tools}/debugging.md | 0
docs/{tools => develop_tools}/developer.md | 0
docs/{tools => develop_tools}/distribution.md | 0
docs/{tools => develop_tools}/index.rst | 0
docs/{tools => develop_tools}/testing.md | 0
docs/index.rst | 3 +-
docs/tools/index.rst | 10 +-
docs/tools/spark_block_cleaner.md | 117 +++++++++++++++++++++
.../kubernetes/spark-block-cleaner.yml | 2 +-
11 files changed, 122 insertions(+), 10 deletions(-)
diff --git a/docs/tools/build_document.md b/docs/develop_tools/build_document.md
similarity index 100%
rename from docs/tools/build_document.md
rename to docs/develop_tools/build_document.md
diff --git a/docs/tools/building.md b/docs/develop_tools/building.md
similarity index 100%
rename from docs/tools/building.md
rename to docs/develop_tools/building.md
diff --git a/docs/tools/debugging.md b/docs/develop_tools/debugging.md
similarity index 100%
rename from docs/tools/debugging.md
rename to docs/develop_tools/debugging.md
diff --git a/docs/tools/developer.md b/docs/develop_tools/developer.md
similarity index 100%
rename from docs/tools/developer.md
rename to docs/develop_tools/developer.md
diff --git a/docs/tools/distribution.md b/docs/develop_tools/distribution.md
similarity index 100%
rename from docs/tools/distribution.md
rename to docs/develop_tools/distribution.md
diff --git a/docs/tools/index.rst b/docs/develop_tools/index.rst
similarity index 100%
copy from docs/tools/index.rst
copy to docs/develop_tools/index.rst
diff --git a/docs/tools/testing.md b/docs/develop_tools/testing.md
similarity index 100%
rename from docs/tools/testing.md
rename to docs/develop_tools/testing.md
diff --git a/docs/index.rst b/docs/index.rst
index 9978c5d..03dae70 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -90,6 +90,7 @@ Kyuubi provides both high availability and load balancing
solutions based on Zoo
integrations/index
monitor/index
sql/index
+ tools/index
.. toctree::
:caption: Kyuubi Insider
@@ -101,7 +102,7 @@ Kyuubi provides both high availability and load balancing
solutions based on Zoo
:caption: Contributing
:maxdepth: 2
- tools/index
+ develop_tools/index
community/index
.. toctree::
diff --git a/docs/tools/index.rst b/docs/tools/index.rst
index a24103a..2e0b02e 100644
--- a/docs/tools/index.rst
+++ b/docs/tools/index.rst
@@ -1,17 +1,11 @@
.. image:: ../imgs/kyuubi_logo.png
:align: center
-Develop Tools
+Tools
===========
.. toctree::
:maxdepth: 2
:numbered: 3
- building
- distribution
- build_document
- testing
- debugging
- community
- developer
+ spark_block_cleaner
\ No newline at end of file
diff --git a/docs/tools/spark_block_cleaner.md
b/docs/tools/spark_block_cleaner.md
new file mode 100644
index 0000000..e4f2ce1
--- /dev/null
+++ b/docs/tools/spark_block_cleaner.md
@@ -0,0 +1,117 @@
+<div align=center>
+
+
+
+</div>
+
+# Kubernetes Tools Spark Block Cleaner
+
+## Requirements
+
+You'd better have cognition upon the following things when you want to use
spark-block-cleaner.
+
+* Read this article
+* An active Kubernetes cluster
+* [Kubectl](https://kubernetes.io/docs/reference/kubectl/overview/)
+* [Docker](https://www.docker.com/)
+
+## Scenes
+
+When you're using Spark On Kubernetes with Client mode and don't use
`emptyDir` for Spark `local-dir` type, you may face the same scenario that
executor pods deleted without clean all the Block files. It may cause disk
overflow.
+
+Therefore, we chose to use Spark Block Cleaner to clear the block files
accumulated by Spark.
+
+## Principle
+
+When deploying Spark Block Cleaner, we will configure volumes for the
destination folder. Spark Block Cleaner will perceive the folder by the
parameter `CACHE_DIRS`.
+
+Spark Block Cleaner will clear the perceived folder in a fixed loop(which can
be configured by `SCHEDULE_INTERVAL`). And Spark Block Cleaner will select
folder start with `blockmgr` and `spark` for deletion using the logic Spark
uses to create those folders.
+
+Before deleting those files, Spark Block Cleaner will determine whether it is
a recently modified file(depending on whether the file has not been acted on
within the specified time which configured by `FILE_EXPIRED_TIME`). Only delete
files those beyond that time interval.
+
+And Spark Block Cleaner will check the disk utilization after clean, if the
remaining space is less than the specified value(control by
`FREE_SPACE_THRESHOLD`), will trigger deep clean(which file expired time
control by `DEEP_CLEAN_FILE_EXPIRED_TIME`).
+
+## Usage
+
+Before you start using Spark Block Cleaner, you should build its docker images.
+
+### Build Block Cleaner Docker Image
+
+In the `KYUUBI_HOME` directory, you can use the following cmd to build docker
image.
+```shell
+ docker build ./tools/spark-block-cleaner/kubernetes/docker
+```
+
+### Modify spark-block-cleaner.yml
+
+You need to modify the
`${KYUUBI_HOME}/tools/spark-block-cleaner/kubernetes/spark-block-cleaner.yml`
to fit your current environment.
+
+In Kyuubi tools, we recommend using `DaemonSet` to start , and we offer
default yaml file in daemonSet way.
+
+Base file structure :
+```yaml
+apiVersion
+kind
+metadata
+ name
+ namespace
+spec
+ select
+ template
+ metadata
+ spce
+ containers
+ - image
+ - volumeMounts
+ - env
+ volumes
+```
+
+You can use affect the performance of Spark Block Cleaner through configure
parameters in containers env part of `spark-block-cleaner.yml`.
+```yaml
+env:
+ - name: CACHE_DIRS
+ value: /data/data1,/data/data2
+ - name: FILE_EXPIRED_TIME
+ value: 604800
+ - name: DEEP_CLEAN_FILE_EXPIRED_TIME
+ value: 432000
+ - name: FREE_SPACE_THRESHOLD
+ value: 60
+ - name: SCHEDULE_INTERVAL
+ value: 3600
+```
+
+The most important thing, configure volumeMounts and volumes corresponding to
Spark local-dirs.
+
+For example, Spark use /spark/shuffle1 as local-dir, you can configure like:
+```yaml
+volumes:
+ - name: block-files-dir-1
+ hostPath:
+ path: /spark/shuffle1
+```
+```yaml
+volumeMounts:
+ - name: block-files-dir-1
+ mountPath: /data/data1
+```
+```yaml
+env:
+ - name: CACHE_DIRS
+ value: /data/data1
+```
+
+### Start daemonSet
+
+After you finishing modifying the above, you can use the following command
`kubectl apply -f
${KYUUBI_HOME}/tools/spark-block-cleaner/kubernetes/spark-block-cleaner.yml` to
start daemonSet.
+
+## Related parameters
+
+Name | Default | unit | Meaning
+--- | --- | --- | ---
+CACHE_DIRS | /data/data1,/data/data2| | The target dirs in container path
which will clean block files.
+FILE_EXPIRED_TIME | 604800 | seconds | Cleaner will clean the block files
which current time - last modified time more than the fileExpiredTime.
+DEEP_CLEAN_FILE_EXPIRED_TIME | 432000 | seconds | Deep clean will clean the
block files which current time - last modified time more than the
deepCleanFileExpiredTime.
+FREE_SPACE_THRESHOLD | 60 | % | After first clean, if free Space low than
threshold trigger deep clean.
+SCHEDULE_INTERVAL | 3600 | seconds | Cleaner sleep between cleaning.
diff --git a/tools/spark-block-cleaner/kubernetes/spark-block-cleaner.yml
b/tools/spark-block-cleaner/kubernetes/spark-block-cleaner.yml
index 24d3cf6..408ee18 100644
--- a/tools/spark-block-cleaner/kubernetes/spark-block-cleaner.yml
+++ b/tools/spark-block-cleaner/kubernetes/spark-block-cleaner.yml
@@ -32,7 +32,7 @@ spec:
name: block-cleaner
spec:
containers:
- # Container image which build by ./Dockerfile
+ # Container image which build by Dockerfile
# TODO official Image
- image: <image>
name: cleaner