This is an automated email from the ASF dual-hosted git repository.
zuston pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/uniffle.git
The following commit(s) were added to refs/heads/master by this push:
new 4290cfe79 feat(doc): Update spark related performance guide in doc
(#2713)
4290cfe79 is described below
commit 4290cfe79142878638dde41beb9c7e1602f758c9
Author: Junfan Zhang <[email protected]>
AuthorDate: Wed Jan 21 15:31:48 2026 +0800
feat(doc): Update spark related performance guide in doc (#2713)
### What changes were proposed in this pull request?
Update spark related performance guide in doc
### Why are the changes needed?
To show the latest spark performance guide in doc
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Needn't
---
Co-authored-by: zhangjunfan <[email protected]>
---
docs/client_guide/spark_client_guide.md | 60 ++++++++++++++++++++++++++++++---
1 file changed, 56 insertions(+), 4 deletions(-)
diff --git a/docs/client_guide/spark_client_guide.md
b/docs/client_guide/spark_client_guide.md
index 2f06cda7c..d9383b14f 100644
--- a/docs/client_guide/spark_client_guide.md
+++ b/docs/client_guide/spark_client_guide.md
@@ -169,6 +169,30 @@ spark.rss.client.reassign.maxReassignServerNum 10
spark.rss.client.reassign.blockRetryMaxTimes 1
```
+### Partition split for huge partitions
+
+To address scenarios where extremely large partitions may cause OOM or disk
exhaustion on the shuffle server, a partition splitting mechanism has been
introduced.
+
+When a partition is identified as a huge partition, it is automatically split
into multiple sub-partitions and distributed across multiple shuffle servers
for writing.
+This mechanism effectively mitigates memory and disk pressure on individual
shuffle servers and significantly improves the performance of tasks with highly
skewed data.
+The following configurations are used to enable this feature (this features
also should be activated by the shuffle-server side):
+
+#### shuffle-server side
+```bash
+# the partition threshold size of partition split. the following threshold is
20GB.
+rss.server.huge-partition.split.limit 21474836480
+```
+
+#### client side
+```bash
+# whether to enable reassign mechanism
+spark.rss.client.reassign.enabled true
+# the partition split mode for huge partition, currently supports LOAD_BALANCE
and PIPELINE, but for performance, we should recommend LOAD_BALANCE mode.
+spark.rss.client.reassign.partitionSplitMode LOAD_BALANCE
+# the load balanced server number for one huge partition when using
LOAD_BALANCE mode.
+spark.rss.client.reassign.partitionSplitLoadBalanceServerNumber 20
+```
+
### Map side combine
Map side combine is a feature for rdd aggregation operators that combines the
shuffle data on map side before sending it to the shuffle server, which can
reduce the amount of data transmitted and the pressure on the shuffle server.
@@ -199,8 +223,36 @@ This mechanism allows compression to overlap with upstream
data reading, maximiz
The feature can be enabled or disabled through the following configuration:
-| Property Name | Default |
Description
|
-|--------------------------------------------------------|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------|
-| spark.rss.client.write.overlappingCompressionEnable | true | Whether
to overlapping compress shuffle blocks.
|
-| rss.client.write.overlappingCompressionThreadsPerVcore | -1 |
Specifies the ratio of overlapping compression threads to Spark executor
vCores. By default, all cores on the machine are used for compression. |
+| Property Name | Default |
Description
|
+|--------------------------------------------------------------|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------|
+| spark.rss.client.write.overlappingCompressionEnable | true |
Whether to overlapping compress shuffle blocks.
|
+| spark.rss.client.write.overlappingCompressionThreadsPerVcore | -1 |
Specifies the ratio of overlapping compression threads to Spark executor
vCores. By default, all cores on the machine are used for compression. |
+
+### Overlapping decompression for shuffle read
+
+This mechanism allows decompression to overlap with downstream data
processing, maximizing shuffle read throughput. It can improve shuffle read
speed by up to 80%, especially when reading large-scale data.
+
+| Property Name | Default |
Description |
+|--------------------------------------------------------|---------|---------------------------------------------------------------------|
+| spark.rss.client.read.overlappingDecompressionEnable | false | Whether
to overlapping decompress shuffle blocks. |
+| spark.rss.client.read.overlappingDecompressionThreads | 1 | Number of
threads to use for overlapping decompress shuffle blocks |
+
+### Prefetch for shuffle read
+
+This mechanism allows prefetching shuffle data before it is needed, reducing
wait time for shuffle read operations. It can improve shuffle read performance
by up to 30%, especially in scenarios with high latency between Spark executors
and shuffle servers.
+
+| Property Name | Default | Description
|
+|-----------------------------------------------|---------|-----------------------------------------------------|
+| spark.rss.client.read.prefetch.enabled | false | Whether to enable
prefetch for shuffle read. |
+
+### Integrity validation for shuffle write and read processing
+
+To ensure the data consistency between Spark client and shuffle server,
integrity validation has been introduced for shuffle write and read processing.
+This feature can detect data corruption during transmission and storage, and
take corresponding measures to ensure data consistency.
+We will track the upstream writers' partitions record number, and validate it
with the downstream readers' partitions record number, these metadata will be
stored in the Spark driver side.
+(attention: if having many tasks, this mechanism will slow down the driver and
cause more GC pressure)
+| Property Name | Default |
Description |
+|-------------------------------------------------------------|---------|-------------------------------------------------------------------------|
+| spark.rss.client.integrityValidation.enabled | false |
Whether to enable integrity validation |
+| spark.rss.client.integrityValidation.failureAnalysisEnabled | false |
Whether to print out the detailed failure if having data inconsistency |