Re: [PR] [#1888] feat(server): Reject sendShuffleData for an application if one of the partitions exceeds the limit [incubator-uniffle]

via GitHub Sun, 04 Aug 2024 20:09:30 -0700


rickyma commented on code in PR #1889:
URL: 
https://github.com/apache/incubator-uniffle/pull/1889#discussion_r1703482744



##########
docs/server_guide.md:
##########
@@ -125,7 +125,7 @@ This document will introduce how to deploy Uniffle shuffle 
servers.
 | rss.server.health.checker.script.execute.timeout | 5000    | Timeout for 
`HealthScriptChecker` execute health script.(ms)                                
                                                                                
                |
 
 ### Huge Partition Optimization
-A huge partition is a common problem for Spark/MR and so on, caused by data 
skew. And it can cause the shuffle server to become unstable. To solve this, we 
introduce some mechanisms to limit the writing of huge partitions to avoid 
affecting regular partitions, more details can be found in 
[ISSUE-378](https://github.com/apache/incubator-uniffle/issues/378). The basic 
rules for limiting large partitions are memory usage limits and flushing 
individual buffers directly to persistent storage.
+A huge partition is a common problem for Spark/MR and so on, caused by data 
skew. And it can cause the shuffle server to become unstable. To solve this, we 
introduce some mechanisms to limit the writing of huge partitions to avoid 
affecting regular partitions, and introduce a hard limit config to reject 
extreme huge partition, more details can be found in 
[ISSUE-378](https://github.com/apache/incubator-uniffle/issues/378). The basic 
rules for limiting large partitions are memory usage limits and flushing 
individual buffers directly to persistent storage.

Review Comment:
   extreme huge partition
   ->
   extremely huge partitions



##########
docs/server_guide.md:
##########
@@ -144,6 +144,11 @@ For HADOOP FS, the conf value of 
`rss.server.single.buffer.flush.threshold` shou
 
 Finally, to improve the speed of writing to HDFS for a single partition, the 
value of `rss.server.max.concurrency.of.per-partition.write` and 
`rss.server.flush.hdfs.threadPool.size` could be increased to 50 or 100.
 
+#### Hard limit
+Once the huge partition reach the hard limit size, the conf is 
`rss.server.huge-partition.size.hard.limit`, server reject the sendShuffleData 
request and do not retry for client, so that client can fail fast and user can 
modify their sql or job to avoid the reach the partition hard limit.

Review Comment:
   ```
   Once the huge partition reaches the hard limit size, which is set by the 
configuration `rss.server.huge-partition.size.hard.limit`, the server will 
reject the `sendShuffleData` request and the client will not retry. This allows 
the client to fail fast and enables the user to modify their SQLs or jobs to 
avoid reaching the partition hard limit.
   
   For example, if the hard limit is set to 50g, the server will reject the 
request if the partition size is greater than 50g, causing the job to 
eventually fail.
   ```



##########
server/src/main/java/org/apache/uniffle/server/ShuffleTaskManager.java:
##########
@@ -517,22 +524,34 @@ public long getPartitionDataSize(String appId, int 
shuffleId, int partitionId) {
   }
 
   public long requireBuffer(
-      String appId, int shuffleId, List<Integer> partitionIds, int 
requireSize) {
+      String appId,
+      int shuffleId,
+      List<Integer> partitionIds,
+      List<Integer> partitionRequireSizes,
+      int requireSize) {
     ShuffleTaskInfo shuffleTaskInfo = shuffleTaskInfos.get(appId);
     if (null == shuffleTaskInfo) {
       LOG.error("No such app is registered. appId: {}, shuffleId: {}", appId, 
shuffleId);
       throw new NoRegisterException("No such app is registered. appId: " + 
appId);
     }
-    for (int partitionId : partitionIds) {
-      long partitionUsedDataSize = getPartitionDataSize(appId, shuffleId, 
partitionId);
-      if (shuffleBufferManager.limitHugePartition(
-          appId, shuffleId, partitionId, partitionUsedDataSize)) {
-        String errorMessage =
-            String.format(
-                "Huge partition is limited to writing. appId: %s, shuffleId: 
%s, partitionIds: %s, partitionUsedDataSize: %s",
-                appId, shuffleId, partitionIds, partitionUsedDataSize);
-        LOG.error(errorMessage);
-        throw new NoBufferForHugePartitionException(errorMessage);
+    // adapt to legacy client which have empty partitionRequireSizes

Review Comment:
   To be compatible with legacy clients which have empty partitionRequireSizes



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [#1888] feat(server): Reject sendShuffleData for an application if one of the partitions exceeds the limit [incubator-uniffle]

Reply via email to