Github user tillrohrmann commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3295#discussion_r101766842
  
    --- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionGraph.java
 ---
    @@ -754,6 +759,139 @@ public void scheduleForExecution(SlotProvider 
slotProvider) throws JobException
                }
        }
     
    +   private void scheduleLazy(SlotProvider slotProvider) throws 
NoResourceAvailableException {
    +           // simply take the vertices without inputs.
    +           for (ExecutionJobVertex ejv : this.tasks.values()) {
    +                   if (ejv.getJobVertex().isInputVertex()) {
    +                           ejv.scheduleAll(slotProvider, 
allowQueuedScheduling);
    +                   }
    +           }
    +   }
    +
    +   /**
    +    * 
    +    * 
    +    * @param slotProvider  The resource provider from which the slots are 
allocated
    +    * @param timeout       The maximum time that the deployment may take, 
before a
    +    *                      TimeoutException is thrown.
    +    */
    +   private void scheduleEager(SlotProvider slotProvider, final Time 
timeout) {
    +           checkState(state == JobStatus.RUNNING, "job is not running 
currently");
    +
    +           // Important: reserve all the space we need up front.
    +           // that way we do not have any operation that can fail between 
allocating the slots
    +           // and adding them to the list. If we had a failure in between 
there, that would
    +           // cause the slots to get lost
    +           final ArrayList<ExecutionAndSlot[]> resources = new 
ArrayList<>(getNumberOfExecutionJobVertices());
    +           final boolean queued = allowQueuedScheduling;
    +
    +           // we use this flag to handle failures in a 'finally' clause
    +           // that allows us to not go through clumsy cast-and-rethrow 
logic
    +           boolean successful = false;
    +
    +           try {
    +                   // collecting all the slots may resize and fail in that 
operation without slots getting lost
    +                   final ArrayList<Future<SimpleSlot>> slotFutures = new 
ArrayList<>(getNumberOfExecutionJobVertices());
    +
    +                   // allocate the slots (obtain all their futures
    +                   for (ExecutionJobVertex ejv : 
getVerticesTopologically()) {
    +                           // these calls are not blocking, they only 
return futures
    +                           ExecutionAndSlot[] slots = 
ejv.allocateResourcesForAll(slotProvider, queued);
    +
    +                           // we need to first add the slots to this list, 
to be safe on release
    +                           resources.add(slots);
    +
    +                           for (ExecutionAndSlot ens : slots) {
    +                                   slotFutures.add(ens.slotFuture);
    +                           }
    +                   }
    +
    +                   // this future is complete once all slot futures are 
complete.
    +                   // the future fails once one slot future fails.
    +                   final ConjunctFuture allAllocationsComplete = 
FutureUtils.combineAll(slotFutures);
    --- End diff --
    
    Shouldn't the `fail` operations be idempotent and only take effect for the 
first failure?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to