This is an automated email from the ASF dual-hosted git repository. binjieyang pushed a commit to branch doc/deploy_on_k8s in repository https://gitbox.apache.org/repos/asf/incubator-celeborn-website.git
commit cb73f85263a2fa7289591c5c5e2374c53f1b2100 Author: zwangsheng <[email protected]> AuthorDate: Tue Apr 4 17:58:59 2023 +0800 Add Doc about Deploy Celeborn On Kubernetes --- .gitignore | 1 + docs/docs/latest/deploy_on_k8s.md | 121 ++++++++++++++++++++++++++++++++++++++ mkdocs.yml | 1 + 3 files changed, 123 insertions(+) diff --git a/.gitignore b/.gitignore index 93cf26c..5e00954 100644 --- a/.gitignore +++ b/.gitignore @@ -16,3 +16,4 @@ .pydevproject .python-version .settings +/site/ diff --git a/docs/docs/latest/deploy_on_k8s.md b/docs/docs/latest/deploy_on_k8s.md new file mode 100644 index 0000000..1f04769 --- /dev/null +++ b/docs/docs/latest/deploy_on_k8s.md @@ -0,0 +1,121 @@ +--- +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +# Deploy Celeborn On Kubernetes + +Celeborn currently supports rapid deployment by using helm. + +## Before Deploy + +1. You should have a Running Kubernetes Cluster. +2. You should understand simple Kubernetes deploy related, + e.g. [Kubernetes Resources](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/). +3. You have + enough [permissions to create resources](https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/). +4. Installed [Helm](https://helm.sh/docs/intro/install/). + +## Deploy + +### 1. Get Celeborn Binary Package + +You can find released version of Celeborn on https://celeborn.apache.org/download/. + +Of course, you can build binary package from master branch or your own branch by using `./build/make-distribution.sh` in +source code. + +Anyway, you should unzip and into binary package. + +### 2. Modify Celeborn Configurations + +> Notice: Celeborn Charts Template Files is in the experimental instability stage, the subsequent optimization will be +> adjusted. + +The configuration in `./charts/celeborn/values.yaml` you should focus on modifying is: + +* image repository - Get images from which repository +* image tag - Which version of image to use +* masterReplicas - Number of celeborn master replicas +* workerReplicas - Number of celeborn worker replicas +* celeborn `celeborn.worker.storage.dirs` - which disk should be mounted for celeborn worker(For more + information, [HostPath](https://kubernetes.io/docs/concepts/storage/volumes/#hostpath)) + +### 3. Helm Install Celeborn Charts + +More details in [Helm Install](https://helm.sh/docs/helm/helm_install/) + +``` +cd ./charts/celeborn + +helm install celeborn -n <namespace> . +``` + +### 4. Check Celeborn + +After the above operation, you should be able to find the corresponding Celeborn Master/Worker +by `kubectl get pods -n <namespace>` + +Etc. + +``` +NAME READY STATUS RESTARTS AGE +celeborn-master-0 1/1 Running 0 1m +... +celeborn-worker-0 1/1 Running 0 1m +... +``` + +Given that Celeborn Master/Worker takes time to start, you can see the following phenomenon: + +``` +** server can't find celeborn-master-0.celeborn-master-svc.default.svc.cluster.local: NXDOMAIN + +waiting for master +Server: 172.17.0.10 +Address: 172.17.0.10#53 + +... + +Name: celeborn-master-0.celeborn-master-svc.default.svc.cluster.local +Address: 10.225.139.80 + +Server: 172.17.0.10 +Address: 172.17.0.10#53 + +starting org.apache.celeborn.service.deploy.master.Master, logging to /opt/celeborn/logs/celeborn--org.apache.celeborn.service.deploy.master.Master-1-celeborn-master-0.out + +... + +23/03/23 14:10:56,081 INFO [main] RaftServer: 0: start RPC server +23/03/23 14:10:56,132 INFO [nioEventLoopGroup-2-1] LoggingHandler: [id: 0x83032bf1] REGISTERED +23/03/23 14:10:56,132 INFO [nioEventLoopGroup-2-1] LoggingHandler: [id: 0x83032bf1] BIND: 0.0.0.0/0.0.0.0:9872 +23/03/23 14:10:56,134 INFO [nioEventLoopGroup-2-1] LoggingHandler: [id: 0x83032bf1, L:/0:0:0:0:0:0:0:0:9872] ACTIVE +23/03/23 14:10:56,135 INFO [JvmPauseMonitor0] JvmPauseMonitor: JvmPauseMonitor-0: Started +23/03/23 14:10:56,208 INFO [main] Master: Metrics system enabled. +23/03/23 14:10:56,216 INFO [main] HttpServer: master: HttpServer started on port 9098. +23/03/23 14:10:56,216 INFO [main] Master: Master started. +``` + +### 5. Build Celeborn Client + +Here, without going into detail on how to configure spark/flink to find celeborn master/worker, mention the key +configuration: + +``` +spark.celeborn.master.endpoints: celeborn-master-0.celeborn-master-svc.default:9097,celeborn-master-1.celeborn-master-svc.default:9097,celeborn-master-2.celeborn-master-svc.default:9097 +``` + +> Notice: You should ensure that Spark/Flink can find the Celeborn Master/Worker via IP or the Kubernetes DNS mentioned +> above diff --git a/mkdocs.yml b/mkdocs.yml index c454aa4..925bb34 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -79,6 +79,7 @@ nav: - Documentation: - Latest: - Deploy: docs/latest/deploy.md + - Deploy On Kubernetes: docs/latest/deploy_on_k8s.md - Configuration: - client: docs/latest/configuration/client.md - master: docs/latest/configuration/master.md
