This is an automated email from the ASF dual-hosted git repository.
zhouky pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-celeborn.git
The following commit(s) were added to refs/heads/main by this push:
new 333db3971 [CELEBORN-954] Add documentation about reliable shuffle data
storage
333db3971 is described below
commit 333db39713144693365c836f3b0453769f56a01b
Author: zhouyifan279 <[email protected]>
AuthorDate: Wed Sep 27 00:39:14 2023 +0800
[CELEBORN-954] Add documentation about reliable shuffle data storage
### What changes were proposed in this pull request?
As title
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
Yes. A new config was added in [README.md
](https://github.com/apache/incubator-celeborn/blob/main/README.md#spark-configuration).
### How was this patch tested?
Closes #1938 from zhouyifan279/reliable-storage-doc.
Authored-by: zhouyifan279 <[email protected]>
Signed-off-by: zky.zhoukeyong <[email protected]>
---
README.md | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/README.md b/README.md
index 0d5866061..5876f2b4f 100644
--- a/README.md
+++ b/README.md
@@ -256,6 +256,10 @@ spark.celeborn.storage.hdfs.dir hdfs://<namenode>/celeborn
# we recommend enabling aqe support to gain better performance
spark.sql.adaptive.enabled true
spark.sql.adaptive.skewJoin.enabled true
+
+# Support Spark Dynamic Resource Allocation
+# Required Spark version >= 3.5.0
+spark.shuffle.sort.io.plugin.class
org.apache.spark.shuffle.celeborn.CelebornShuffleDataIO
```
### Deploy Flink client
@@ -302,7 +306,12 @@ Masters and works can be deployed on the same node but
should not deploy multipl
See more detail in
[CONFIGURATIONS](https://celeborn.apache.org/docs/latest/configuration/)
### Support Spark Dynamic Allocation
-We provide a patch to enable users to use Spark with both Dynamic Resource
Allocation(DRA) and Celeborn.
+For Spark versions >= 3.5.0, Celeborn can be used with Dynamic Resource
Allocation(DRA)
+when `spark.shuffle.sort.io.plugin.class` is set to
`org.apache.spark.shuffle.celeborn.CelebornShuffleDataIO`.
+Check [SPARK-42689](https://issues.apache.org/jira/browse/SPARK-42689) and
[CELEBORN-911](https://issues.apache.org/jira/browse/CELEBORN-911)
+for more details.
+
+For Spark versions < 3.5.0, we provide a patch to enable users to use Spark
with DRA and Celeborn.
- For Spark 2.x check [Spark2
Patch](assets/spark-patch/Celeborn_Dynamic_Allocation_spark2.patch).
- For Spark 3.0-3.3 check [Spark3
Patch](assets/spark-patch/Celeborn_Dynamic_Allocation_spark3.patch).
- For Spark 3.4 check [Spark3.4
Patch](assets/spark-patch/Celeborn_Dynamic_Allocation_spark3_4.patch).