xintongsong commented on a change in pull request #11320: 
[FLINK-16437][runtime] Make SlotManager allocate resource from ResourceManager 
at the worker granularity.
URL: https://github.com/apache/flink/pull/11320#discussion_r405329343
 
 

 ##########
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/ResourceManager.java
 ##########
 @@ -1054,10 +1054,10 @@ protected abstract void internalDeregisterApplication(
         * Allocates a resource using the resource profile.
         *
         * @param resourceProfile The resource description
-        * @return Collection of {@link ResourceProfile} describing the 
launched slots
+        * @return whether the resource can be allocated
         */
        @VisibleForTesting
-       public abstract Collection<ResourceProfile> 
startNewWorker(ResourceProfile resourceProfile);
+       public abstract boolean startNewWorker(ResourceProfile resourceProfile);
 
 Review comment:
   Looking more into this, I'm not sure whether returning the container size is 
the right approach.
   
   The problem is, the container resource contains two parts that are decided 
by different components.
   - Those contained in `WorkerResourceSpec` are decided by `SlotManager`: cpu, 
task heap, task off-heap, network, managed.
   - Those contained in `TaskExecutorProcessSpec` but not in 
`WorkerResourceSpec` are decided by `ResourceManager`: framework heap, 
framework off-heap, jvm metaspace, jvm overhead.
   
   If we return the size of the entire started container to SM, it does not 
help SM to decide whether there are resources wasted, because SM is not aware 
of how many resources in addition to `WorkerResourceSpec` that the framework 
and jvm need.
   
   An alternative is to exclude the framework and jvm resources and the 
remaining to SM. The problem is that, the excluded part is not constant and 
might be dependent on `WorkerResourceSpec`. To be specific, jvm overhead might 
be configured as a fraction of total process memory size. That means increasing 
the total size of `WorkerResourceSpec` may also lead to the increasing of jvm 
overhead.
   
   I think what should be returned by this API really depends on how we want to 
solve the resource wasting issue. TBH, I'm not sure when would we have a 
`SlotManager` implementation that deals with the Yarn resource wasting problem. 
I would suggest to create a follow-up ticket and leave it to when we do have 
such a SM implementation. WDYT?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to