This is an automated email from the ASF dual-hosted git repository.

zuston pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/uniffle.git


The following commit(s) were added to refs/heads/master by this push:
     new e787d87c9 [#2583] fix(spark): Enable taskIds filter only on AQE and 
multi replicas for reader (#2584)
e787d87c9 is described below

commit e787d87c96f3d040e23b366afe378b657a02453d
Author: Junfan Zhang <zus...@apache.org>
AuthorDate: Tue Aug 19 14:27:58 2025 +0800

    [#2583] fix(spark): Enable taskIds filter only on AQE and multi replicas 
for reader (#2584)
    
    ### What changes were proposed in this pull request?
    
    Only enable taskIds filter mechanism on AQE skew hit and multi replicas. 
Previously, when multi shuffle servers are assigned for reader to read, it will 
enable this mechanism, that will hurt the shuffle-servers performance due to 
the bitmap check, actually there is no need to do this on partition reassign 
(partition split) is enabled.
    
    ### Why are the changes needed?
    
    To fix #2583
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    Existing tests
---
 .../org/apache/spark/shuffle/reader/RssShuffleReader.java  | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git 
a/client-spark/spark3/src/main/java/org/apache/spark/shuffle/reader/RssShuffleReader.java
 
b/client-spark/spark3/src/main/java/org/apache/spark/shuffle/reader/RssShuffleReader.java
index f5563bdce..b53d67f62 100644
--- 
a/client-spark/spark3/src/main/java/org/apache/spark/shuffle/reader/RssShuffleReader.java
+++ 
b/client-spark/spark3/src/main/java/org/apache/spark/shuffle/reader/RssShuffleReader.java
@@ -267,11 +267,17 @@ public class RssShuffleReader<K, C> implements 
ShuffleReader<K, C> {
             && rssConf.getBoolean(RSS_READ_REORDER_MULTI_SERVERS_ENABLED)) {
           Collections.shuffle(shuffleServerInfoList);
         }
-        // This mechanism of expectedTaskIdsBitmap filter is to filter out the 
most of data.
-        // especially for AQE skew optimization
+
+        // This mechanism of taskId filter is to filter out the most of data 
for AQE skew and multi
+        // replicas cases
+        boolean isReplicaFilterEnabled =
+            rssConf.getInteger(
+                        RssClientConfig.RSS_DATA_REPLICA,
+                        RssClientConfig.RSS_DATA_REPLICA_DEFAULT_VALUE)
+                    > 1
+                && shuffleServerInfoList.size() > 1;
         boolean expectedTaskIdsBitmapFilterEnable =
-            !(mapStartIndex == 0 && mapEndIndex == Integer.MAX_VALUE)
-                || shuffleServerInfoList.size() > 1;
+            !(mapStartIndex == 0 && mapEndIndex == Integer.MAX_VALUE) || 
isReplicaFilterEnabled;
         int retryMax =
             rssConf.getInteger(
                 RssClientConfig.RSS_CLIENT_RETRY_MAX,

Reply via email to