Re: [PR] [#1608][part-5] feat(spark3): always use the latest assignment and load balance for huge partition [incubator-uniffle]

via GitHub Wed, 24 Apr 2024 04:10:51 -0700


qqqttt123 commented on code in PR #1652:
URL: 
https://github.com/apache/incubator-uniffle/pull/1652#discussion_r1577708780



##########
client-spark/spark3/src/main/java/org/apache/spark/shuffle/writer/RssShuffleWriter.java:
##########
@@ -523,20 +532,31 @@ private void resendFailedBlocks(Set<TrackingBlockStatus> 
failedBlockStatusSet) {
 
     for (Map.Entry<ShuffleServerInfo, List<TrackingBlockStatus>> entry :
         faultyServerToPartitions.entrySet()) {
-      Set<Integer> partitionIds =
-          entry.getValue().stream()
-              .map(x -> x.getShuffleBlockInfo().getPartitionId())
-              .collect(Collectors.toSet());
-      ShuffleServerInfo replacement = 
replacementShuffleServers.get(entry.getKey().getId());
-      if (replacement == null) {
-        // todo: merge multiple requests into one.
-        replacement = reassignFaultyShuffleServer(partitionIds, 
entry.getKey().getId());
-        replacementShuffleServers.put(entry.getKey().getId(), replacement);
+      ShuffleServerInfo faultyServer = entry.getKey();
+      List<TrackingBlockStatus> blocks = entry.getValue();
+
+      if (!taskAttemptAssignment.isReassigned(faultyServer.getId())) {

Review Comment:
   I hesitate that we should coupled the logic between faulty server and 
rebalance.
   If a server is  a fault server, we should change the server. If  a server 
has a huge partition, we should use another shuffle server or split partial 
tasks to write another server?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [#1608][part-5] feat(spark3): always use the latest assignment and load balance for huge partition [incubator-uniffle]

Reply via email to