pan3793 commented on code in PR #5942: URL: https://github.com/apache/kyuubi/pull/5942#discussion_r1441220256
########## docs/tools/spark_block_cleaner.md: ########## @@ -17,119 +17,5 @@ # Kubernetes Tools Spark Block Cleaner -## Requirements - -You'd better have cognition upon the following things when you want to use spark-block-cleaner. - -* Read this article -* An active Kubernetes cluster -* [Kubectl](https://kubernetes.io/docs/reference/kubectl/overview/) -* [Docker](https://www.docker.com/) - -## Scenes - -When you're using Spark On Kubernetes with Client mode and don't use `emptyDir` for Spark `local-dir` type, you may face the same scenario that executor pods deleted without clean all the Block files. It may cause disk overflow. - -Therefore, we chose to use Spark Block Cleaner to clear the block files accumulated by Spark. - -## Principle - -When deploying Spark Block Cleaner, we will configure volumes for the destination folder. Spark Block Cleaner will perceive the folder by the parameter `CACHE_DIRS`. - -Spark Block Cleaner will clear the perceived folder in a fixed loop(which can be configured by `SCHEDULE_INTERVAL`). And Spark Block Cleaner will select folder start with `blockmgr` and `spark` for deletion using the logic Spark uses to create those folders. - -Before deleting those files, Spark Block Cleaner will determine whether it is a recently modified file(depending on whether the file has not been acted on within the specified time which configured by `FILE_EXPIRED_TIME`). Only delete files those beyond that time interval. - -And Spark Block Cleaner will check the disk utilization after clean, if the remaining space is less than the specified value(control by `FREE_SPACE_THRESHOLD`), will trigger deep clean(which file expired time control by `DEEP_CLEAN_FILE_EXPIRED_TIME`). - -## Usage - -Before you start using Spark Block Cleaner, you should build its docker images. - -### Build Block Cleaner Docker Image - -In the `KYUUBI_HOME` directory, you can use the following cmd to build docker image. - -```shell -docker build ./tools/spark-block-cleaner/kubernetes/docker -``` - -### Modify spark-block-cleaner.yml - -You need to modify the `${KYUUBI_HOME}/tools/spark-block-cleaner/kubernetes/spark-block-cleaner.yml` to fit your current environment. - -In Kyuubi tools, we recommend using `DaemonSet` to start, and we offer default yaml file in daemonSet way. - -Base file structure: - -```yaml -apiVersion -kind -metadata - name - namespace -spec - select - template - metadata - spce - containers - - image - - volumeMounts - - env - volumes -``` - -You can use affect the performance of Spark Block Cleaner through configure parameters in containers env part of `spark-block-cleaner.yml`. - -```yaml -env: - - name: CACHE_DIRS - value: /data/data1,/data/data2 - - name: FILE_EXPIRED_TIME - value: 604800 - - name: DEEP_CLEAN_FILE_EXPIRED_TIME - value: 432000 - - name: FREE_SPACE_THRESHOLD - value: 60 - - name: SCHEDULE_INTERVAL - value: 3600 -``` - -The most important thing, configure volumeMounts and volumes corresponding to Spark local-dirs. - -For example, Spark use /spark/shuffle1 as local-dir, you can configure like: - -```yaml -volumes: - - name: block-files-dir-1 - hostPath: - path: /spark/shuffle1 -``` - -```yaml -volumeMounts: - - name: block-files-dir-1 - mountPath: /data/data1 -``` - -```yaml -env: - - name: CACHE_DIRS - value: /data/data1 -``` - -### Start daemonSet - -After you finishing modifying the above, you can use the following command `kubectl apply -f ${KYUUBI_HOME}/tools/spark-block-cleaner/kubernetes/spark-block-cleaner.yml` to start daemonSet. - -## Related parameters - -| Name | Default | unit | Meaning | -|------------------------------|-------------------------|---------|-----------------------------------------------------------------------------------------------------------------------| -| CACHE_DIRS | /data/data1,/data/data2 | | The target dirs in container path which will clean block files. | -| FILE_EXPIRED_TIME | 604800 | seconds | Cleaner will clean the block files which current time - last modified time more than the fileExpiredTime. | -| DEEP_CLEAN_FILE_EXPIRED_TIME | 432000 | seconds | Deep clean will clean the block files which current time - last modified time more than the deepCleanFileExpiredTime. | -| FREE_SPACE_THRESHOLD | 60 | % | After first clean, if free Space low than threshold trigger deep clean. | -| SCHEDULE_INTERVAL | 3600 | seconds | Cleaner sleep between cleaning. | - +**Note**: +This tool been removed since Kyuubi 1.9.0. Review Comment: ```suggestion This tool has been removed since Kyuubi 1.9.0. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
