codenohup commented on code in PR #2932:
URL: https://github.com/apache/celeborn/pull/2932#discussion_r1851261674


##########
client-flink/common/src/main/java/org/apache/celeborn/plugin/flink/AbstractRemoteShuffleEnvironment.java:
##########
@@ -129,10 +129,6 @@ public ShuffleIOOwnerContext createShuffleIOOwnerContext(
         nettyGroup.addGroup(METRIC_GROUP_INPUT));
   }
 
-  public void releasePartitionsLocally(Collection<ResultPartitionID> 
partitionIds) {
-    throw new FlinkRuntimeException("Not implemented yet.");
-  }
-
   public Collection<ResultPartitionID> getPartitionsOccupyingLocalResources() {

Review Comment:
   It may cause the TaskManager to be released unexpectedly, and the shuffle 
data on the TaskManager will be lost.
   
   The method getPartitionsOccupyingLocalResources of ShuffleEnvironment 
represents whether the TaskManager should be released when it becomes idle. In 
case the upstream vertex has finished, and the downstream vertex has not 
started due to insufficient resources, the TaskManager of the upstream vertex 
will become idle and be released unexpectedly.
   
   IMO, the method should return partitionManager#getUnreleasedPartitions, as 
the ResultPartition created by the Flink framework (such as 
SortMergeResultPartition) will be registered to the partitionManager, and the 
ResultPartition created by Celeborn (RemoteShuffleResultPartition) is not 
registered with the partitionManager, as it does not occupy TaskManager 
resources.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to