[GitHub] flink pull request #6192: [FLINK-9567][runtime][yarn] Fix the bug that Flink...

tillrohrmann Fri, 29 Jun 2018 07:00:01 -0700

Github user tillrohrmann commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6192#discussion_r199168161
  
    --- Diff: 
flink-yarn/src/main/java/org/apache/flink/yarn/YarnResourceManager.java ---
    @@ -334,8 +335,11 @@ public void onContainersCompleted(final 
List<ContainerStatus> list) {
                                        if (yarnWorkerNode != null) {
                                                // Container completed 
unexpectedly ~> start a new one
                                                final Container container = 
yarnWorkerNode.getContainer();
    -                                           
requestYarnContainer(container.getResource(), 
yarnWorkerNode.getContainer().getPriority());
    -                                           
closeTaskManagerConnection(resourceId, new 
Exception(containerStatus.getDiagnostics()));
    +                                           // check WorkerRegistration 
status to avoid requesting containers more than required
    +                                           if 
(checkWorkerRegistrationWithResourceId(resourceId)) {
    --- End diff --
    
    What we maybe could do instead of counting simply how many pending 
container requests we have is to ask the `SlotManager` how many pending slot 
allocations it has. If the number of slot allocations is lower than the product 
of slots per task manager x pending container requests, then we would not have 
to restart the container.

---

[GitHub] flink pull request #6192: [FLINK-9567][runtime][yarn] Fix the bug that Flink...

Reply via email to