azagrebin commented on a change in pull request #13730:
URL: https://github.com/apache/flink/pull/13730#discussion_r512483321



##########
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/SlotSharingExecutionSlotAllocator.java
##########
@@ -212,20 +214,24 @@ private ResourceProfile 
getPhysicalSlotResourceProfile(ExecutionSlotSharingGroup
                        .reduce(ResourceProfile.ZERO, (r, e) -> 
r.merge(resourceProfileRetriever.apply(e)), ResourceProfile::merge);
        }
 
-       private SharingPhysicalSlotRequestBulk 
createBulk(Map<ExecutionSlotSharingGroup, List<ExecutionVertexID>> executions) {
-               Map<ExecutionSlotSharingGroup, ResourceProfile> pendingRequests 
= executions
-                       .keySet()
-                       .stream()
-                       .collect(Collectors.toMap(
-                               group -> group,
-                               group -> 
sharedSlots.get(group).getPhysicalSlotResourceProfile()
-                       ));
+       private Optional<SharingPhysicalSlotRequestBulk> createBulk(
+                       Map<ExecutionSlotSharingGroup, List<ExecutionVertexID>> 
executions) {
+               Map<ExecutionSlotSharingGroup, ResourceProfile> pendingRequests 
= new HashMap<>();
+               for (ExecutionSlotSharingGroup group : executions.keySet()) {
+                       SharedSlot sharedSlot = sharedSlots.get(group);
+                       if (sharedSlot == null || 
sharedSlot.getSlotContextFuture().isCompletedExceptionally()) {

Review comment:
       True, I also do not see that it can happen but this is only due to the 
current internal implementation of `SlotPoolImpl`.
   Previously this happened only due to failed slot profile future and there is 
no future anymore.
   However I think this still good not to make this assumption about the 
`SlotPool` interface.
   
   As discussed offline, this is then not the full fix for the immediately 
failed physical slot future because
   we add the `SharedSlot` in `allocateLogicalSlotsFromSharedSlots` and 
continue to add logical slots to it.
   Eventually, the logical slot also fails and gets removed from `SharedSlot` 
which gets released (state RELEASED).
   The subsequent logical slot addings in the loop of 
`allocateLogicalSlotsFromSharedSlots` will fail the scheduling
   with the ALLOCATED state check, it will be RELEASED.
   
   Hence, I am excluding this hotfix from this PR and I will open another issue 
and PR for this.
   This hotfix does not affect the main fix of this PR anyways.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to