This is an automated email from the ASF dual-hosted git repository.

ulyssesyou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-kyuubi.git


The following commit(s) were added to refs/heads/master by this push:
     new 5b49061  [KYUUBI #815] [DOC] [KUBERNETES] Doc for spark-block-cleaner
5b49061 is described below

commit 5b49061d868000eb9a0517a6cfd8b27161d75db4
Author: Binjie Yang <[email protected]>
AuthorDate: Fri Jul 16 22:21:39 2021 +0800

    [KYUUBI #815] [DOC] [KUBERNETES] Doc for spark-block-cleaner
    
    ### _Why are the changes needed?_
    Add Docs for kyuubi tools spark-block-cleaner.
    * Explain the parameters
    * Introduction to basic startup
    * Give an example
    
    ### _How was this patch tested?_
    - [ ] Add some test cases that check the changes thoroughly including 
negative and positive cases if possible
    
    - [ ] Add screenshots for manual tests if appropriate
    
    - [X] [Run 
test](https://kyuubi.readthedocs.io/en/latest/tools/testing.html#running-tests) 
locally before make a pull request
    
    Closes #815 from zwangsheng/doc/spark_block_cleaner.
    
    Closes #815
    
    1ec6795f [Binjie Yang] delete todo
    bbf4d6e2 [Binjie Yang] make it common
    9cf3e159 [Binjie Yang] format
    0803995a [Binjie Yang] straighten out the article
    f834b382 [Binjie Yang] refactor
    25be318f [Binjie Yang] fix
    7304e595 [Binjie Yang] docs for spark-block-cleaner
    
    Authored-by: Binjie Yang <[email protected]>
    Signed-off-by: ulysses-you <[email protected]>
---
 docs/{tools => develop_tools}/build_document.md    |   0
 docs/{tools => develop_tools}/building.md          |   0
 docs/{tools => develop_tools}/debugging.md         |   0
 docs/{tools => develop_tools}/developer.md         |   0
 docs/{tools => develop_tools}/distribution.md      |   0
 docs/{tools => develop_tools}/index.rst            |   0
 docs/{tools => develop_tools}/testing.md           |   0
 docs/index.rst                                     |   3 +-
 docs/tools/index.rst                               |  10 +-
 docs/tools/spark_block_cleaner.md                  | 117 +++++++++++++++++++++
 .../kubernetes/spark-block-cleaner.yml             |   2 +-
 11 files changed, 122 insertions(+), 10 deletions(-)

diff --git a/docs/tools/build_document.md b/docs/develop_tools/build_document.md
similarity index 100%
rename from docs/tools/build_document.md
rename to docs/develop_tools/build_document.md
diff --git a/docs/tools/building.md b/docs/develop_tools/building.md
similarity index 100%
rename from docs/tools/building.md
rename to docs/develop_tools/building.md
diff --git a/docs/tools/debugging.md b/docs/develop_tools/debugging.md
similarity index 100%
rename from docs/tools/debugging.md
rename to docs/develop_tools/debugging.md
diff --git a/docs/tools/developer.md b/docs/develop_tools/developer.md
similarity index 100%
rename from docs/tools/developer.md
rename to docs/develop_tools/developer.md
diff --git a/docs/tools/distribution.md b/docs/develop_tools/distribution.md
similarity index 100%
rename from docs/tools/distribution.md
rename to docs/develop_tools/distribution.md
diff --git a/docs/tools/index.rst b/docs/develop_tools/index.rst
similarity index 100%
copy from docs/tools/index.rst
copy to docs/develop_tools/index.rst
diff --git a/docs/tools/testing.md b/docs/develop_tools/testing.md
similarity index 100%
rename from docs/tools/testing.md
rename to docs/develop_tools/testing.md
diff --git a/docs/index.rst b/docs/index.rst
index 9978c5d..03dae70 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -90,6 +90,7 @@ Kyuubi provides both high availability and load balancing 
solutions based on Zoo
    integrations/index
    monitor/index
    sql/index
+   tools/index
 
 .. toctree::
    :caption: Kyuubi Insider
@@ -101,7 +102,7 @@ Kyuubi provides both high availability and load balancing 
solutions based on Zoo
    :caption: Contributing
    :maxdepth: 2
 
-   tools/index
+   develop_tools/index
    community/index
 
 .. toctree::
diff --git a/docs/tools/index.rst b/docs/tools/index.rst
index a24103a..2e0b02e 100644
--- a/docs/tools/index.rst
+++ b/docs/tools/index.rst
@@ -1,17 +1,11 @@
 .. image:: ../imgs/kyuubi_logo.png
    :align: center
 
-Develop Tools
+Tools
 ===========
 
 .. toctree::
     :maxdepth: 2
     :numbered: 3
 
-    building
-    distribution
-    build_document
-    testing
-    debugging
-    community
-    developer
+    spark_block_cleaner
\ No newline at end of file
diff --git a/docs/tools/spark_block_cleaner.md 
b/docs/tools/spark_block_cleaner.md
new file mode 100644
index 0000000..e4f2ce1
--- /dev/null
+++ b/docs/tools/spark_block_cleaner.md
@@ -0,0 +1,117 @@
+<div align=center>
+
+![](../imgs/kyuubi_logo.png)
+
+</div>
+
+# Kubernetes Tools Spark Block Cleaner
+
+## Requirements
+
+You'd better have cognition upon the following things when you want to use 
spark-block-cleaner.
+
+* Read this article
+* An active Kubernetes cluster
+* [Kubectl](https://kubernetes.io/docs/reference/kubectl/overview/)
+* [Docker](https://www.docker.com/)
+
+## Scenes
+
+When you're using Spark On Kubernetes with Client mode and don't use 
`emptyDir` for Spark `local-dir` type, you may face the same scenario that 
executor pods deleted without clean all the Block files. It may cause disk 
overflow.
+
+Therefore, we chose to use Spark Block Cleaner to clear the block files 
accumulated by Spark.
+
+## Principle
+
+When deploying Spark Block Cleaner, we will configure volumes for the 
destination folder. Spark Block Cleaner will perceive the folder by the 
parameter `CACHE_DIRS`. 
+
+Spark Block Cleaner will clear the perceived folder in a fixed loop(which can 
be configured by `SCHEDULE_INTERVAL`). And Spark Block Cleaner will select 
folder start with `blockmgr` and `spark` for deletion using the logic Spark 
uses to create those folders. 
+
+Before deleting those files, Spark Block Cleaner will determine whether it is 
a recently modified file(depending on whether the file has not been acted on 
within the specified time which configured by `FILE_EXPIRED_TIME`). Only delete 
files those beyond that time interval.
+
+And Spark Block Cleaner will check the disk utilization after clean, if the 
remaining space is less than the specified value(control by 
`FREE_SPACE_THRESHOLD`), will trigger deep clean(which file expired time 
control by `DEEP_CLEAN_FILE_EXPIRED_TIME`).
+
+## Usage
+
+Before you start using Spark Block Cleaner, you should build its docker images.
+
+### Build Block Cleaner Docker Image
+
+In the `KYUUBI_HOME` directory, you can use the following cmd to build docker 
image.
+```shell
+    docker build ./tools/spark-block-cleaner/kubernetes/docker
+```
+
+### Modify spark-block-cleaner.yml
+
+You need to modify the 
`${KYUUBI_HOME}/tools/spark-block-cleaner/kubernetes/spark-block-cleaner.yml` 
to fit your current environment.
+
+In Kyuubi tools, we recommend using `DaemonSet` to start , and we offer 
default yaml file in daemonSet way.
+
+Base file structure : 
+```yaml
+apiVersion
+kind
+metadata
+  name
+  namespace
+spec
+  select
+  template
+    metadata
+    spce
+      containers
+      - image
+      - volumeMounts
+      - env
+    volumes
+```
+
+You can use affect the performance of Spark Block Cleaner through configure 
parameters in containers env part of `spark-block-cleaner.yml`.
+```yaml
+env:
+  - name: CACHE_DIRS
+    value: /data/data1,/data/data2
+  - name: FILE_EXPIRED_TIME
+    value: 604800
+  - name: DEEP_CLEAN_FILE_EXPIRED_TIME
+    value: 432000
+  - name: FREE_SPACE_THRESHOLD
+    value: 60
+  - name: SCHEDULE_INTERVAL
+    value: 3600
+```
+
+The most important thing, configure volumeMounts and volumes corresponding to 
Spark local-dirs.
+
+For example, Spark use /spark/shuffle1 as local-dir, you can configure like:
+```yaml
+volumes:
+  - name: block-files-dir-1
+    hostPath:
+      path: /spark/shuffle1
+```
+```yaml
+volumeMounts:
+  - name: block-files-dir-1
+    mountPath: /data/data1
+```
+```yaml
+env:
+  - name: CACHE_DIRS
+    value: /data/data1
+```
+
+### Start daemonSet
+
+After you finishing modifying the above, you can use the following command 
`kubectl apply -f 
${KYUUBI_HOME}/tools/spark-block-cleaner/kubernetes/spark-block-cleaner.yml` to 
start daemonSet.
+
+## Related parameters
+
+Name | Default | unit | Meaning
+--- | --- | --- | ---
+CACHE_DIRS | /data/data1,/data/data2|  | The target dirs in container path 
which will clean block files.
+FILE_EXPIRED_TIME | 604800 | seconds | Cleaner will clean the block files 
which current time - last modified time  more than the fileExpiredTime.
+DEEP_CLEAN_FILE_EXPIRED_TIME | 432000 | seconds | Deep clean will clean the 
block files which current time - last modified time  more than the 
deepCleanFileExpiredTime.
+FREE_SPACE_THRESHOLD | 60 | % | After first clean, if free Space low than 
threshold trigger deep clean.
+SCHEDULE_INTERVAL | 3600 | seconds | Cleaner sleep between cleaning.
diff --git a/tools/spark-block-cleaner/kubernetes/spark-block-cleaner.yml 
b/tools/spark-block-cleaner/kubernetes/spark-block-cleaner.yml
index 24d3cf6..408ee18 100644
--- a/tools/spark-block-cleaner/kubernetes/spark-block-cleaner.yml
+++ b/tools/spark-block-cleaner/kubernetes/spark-block-cleaner.yml
@@ -32,7 +32,7 @@ spec:
         name: block-cleaner
     spec:
       containers:
-        # Container image which build by ./Dockerfile
+        # Container image which build by Dockerfile
         # TODO official Image
         - image: <image>
           name: cleaner

Reply via email to