This is an automated email from the ASF dual-hosted git repository.
zhouky pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-celeborn.git
The following commit(s) were added to refs/heads/main by this push:
new fb2af146b [CELEBORN-822][DOC] Add quick start guide
fb2af146b is described below
commit fb2af146bfc78ef8b07b3db664abb75988f58f51
Author: zky.zhoukeyong <[email protected]>
AuthorDate: Sat Jul 22 21:39:41 2023 +0800
[CELEBORN-822][DOC] Add quick start guide
### What changes were proposed in this pull request?
As title.

### Why are the changes needed?
As title.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes #1745 from waitinfuture/822.
Lead-authored-by: zky.zhoukeyong <[email protected]>
Co-authored-by: Keyong Zhou <[email protected]>
Signed-off-by: zky.zhoukeyong <[email protected]>
---
docs/README.md | 97 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 94 insertions(+), 3 deletions(-)
diff --git a/docs/README.md b/docs/README.md
index b1493b79e..4e7810017 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -1,7 +1,6 @@
---
hide:
- navigation
- - toc
license: |
Licensed to the Apache Software Foundation (ASF) under one or more
@@ -17,6 +16,98 @@ license: |
See the License for the specific language governing permissions and
limitations under the License.
---
+Quick Start
+===
+This documentation gives a quick start guide for running Apache Spark with
Apache Celeborn(Incubating).
-Apache Celeborn (Incubating)
-===
\ No newline at end of file
+### Download Celeborn
+Download the latest Celeborn binary from the [Downloading
Page](https://celeborn.apache.org/download/).
+Decompress the binary and set `$CELEBORN_HOME`
+```shell
+tar -C <DST_DIR> -zxvf apache-celeborn-<VERSION>-bin.tgz
+export $CELEBORN_HOME=${Decompressed path}
+```
+
+## Configure Logging and Storage
+#### Configure Logging
+```shell
+cd $CELEBORN_HOME/conf
+cp log4j2.xml.template log4j2.xml
+```
+#### Configure Storage
+Configure the directory to store shuffle data, for example
`$CELEBORN_HOME/shuffle`
+```shell
+cd $CELEBORN_HOME/conf
+echo "celeborn.worker.storage.dirs=$CELEBORN_HOME/shuffle" >
celeborn-defaults.conf
+```
+
+## Start Celeborn Service
+#### Start Master
+```shell
+cd $CELEBORN_HOME
+./sbin/start-master.sh
+```
+You should see `Master`'s ip:port in the log:
+```shell
+INFO [main] NettyRpcEnvFactory: Starting RPC Server [MasterSys] on
192.168.2.109:9097
+```
+#### Start Worker
+Use the Master's IP and Port to start Worker:
+```shell
+cd $CELEBORN_HOME
+./sbin/start-worker.sh celeborn://${Master IP}:${Master Port}
+```
+You should see the following message in Worker's log:
+```shell
+23/07/22 11:39:23,546 INFO [main] MasterClient: connect to master
192.168.2.109:9097.
+23/07/22 11:39:23,673 INFO [main] Worker: Register worker successfully.
+```
+And also the following message in Master's log:
+```shell
+23/07/22 11:39:23,650 INFO [dispatcher-event-loop-9] Master: Registered worker
+Host: 192.168.2.109
+RpcPort: 57806
+PushPort: 57807
+FetchPort: 57809
+ReplicatePort: 57808
+SlotsUsed: 0
+LastHeartbeat: 0
+HeartbeatElapsedSeconds: xxx
+Disks:
+ DiskInfo0: xxx
+UserResourceConsumption: empty
+WorkerRef: null
+```
+
+## Start Spark with Celeborn
+#### Copy Celeborn Client to Spark's jars
+Celeborn release binary contains clients for Spark 2.x and Spark 3.x, copy the
corresponding client jar into Spark's
+`jars/` directory:
+```shell
+cp $CELEBORN_HOME/spark/<Celeborn Client Jar> $SPARK_HOME/jars/
+```
+#### Start spark-shell
+Set `spark.shuffle.manager` to Celeborn's ShuffleManager, and turn off
`spark.shuffle.service.enabled`:
+```shell
+cd $SPARK_HOME
+
+./bin/spark-shell \
+--conf
spark.shuffle.manager=org.apache.spark.shuffle.celeborn.SparkShuffleManager \
+--conf spark.shuffle.service.enabled=false
+```
+Then run the following test case:
+```shell
+spark.sparkContext.parallelize(1 to 10, 10)
+ .flatMap( _ => (1 to 100).iterator
+ .map(num => num)).repartition(10).count
+```
+During the Spark Job, you should see the following message in Celeborn
Master's log:
+```shell
+Master: Offer slots successfully for 10 reducers of local-1690000152711-0 on 1
workers.
+```
+And the following message in Celeborn Worker's log:
+```shell
+23/07/22 12:29:57,952 INFO [dispatcher-event-loop-9] Controller: Reserved 10
primary location and 0 replica location for local-1690000152711-0
+23/07/22 12:29:58,117 INFO [dispatcher-event-loop-10] Controller: Start
commitFiles for local-1690000152711-0
+23/07/22 12:29:58,153 INFO [async-reply] Controller: CommitFiles for
local-1690000152711-0 success with 10 committed primary partitions, 0 empty
primary partitions, 0 failed primary partitions, 0 committed replica
partitions, 0 empty replica partitions, 0 failed replica partitions.
+```
\ No newline at end of file