wilfred-s commented on issue #89: Core allocation/reservation logic renovation
URL: 
https://github.com/apache/incubator-yunikorn-core/pull/89#issuecomment-587269592
 
 
   New commits pushed with the smoke tests and further clean up. To link this 
back to the comment from @yangwwei:
   comment 1: 
(https://github.com/apache/incubator-yunikorn-core/pull/89#issuecomment-584913754)
 remarks:
   1.  internal unreserve: fixed the issue found (commit 1)
   1.  tryAllocate correctly unreserves (commit 2)
   1. if reserved allocation fails try all nodes (commit 1)
   
   comment 2 
(https://github.com/apache/incubator-yunikorn-core/pull/89#issuecomment-585361646)
 remarks
   1. running predicates for each allocation try for each  node in all cycles 
will cause a huge slow down. For example in a 100 node cluster 1 predicates 
check is run if there are enough resources available on the node. If we do it 
before that check and let it lead us and we have 99 nodes that do not fit the 
ask we would have run the predicates 100 times for the same alloc. Caching the 
predicate run is not possible as node usage can change and thus the predicate 
outcome would change. I think that is 1) is thus a no go.
   1. the score used is really basic at the moment. However I could argue for 
or against all scores. A large node might have a longer average runtime per 
allocation (service type load) and thus release less often. Without metrics we 
really cannot argue for one or the other or for a 3rd alternative.
   1. Yes we need better metrics, I will follow up with a new jira
   
   For the test failures: I have seen a number of them and they are transient. 
The tests use a manual scheduler (steps based on a counter). The manual 
scheduling in the smoke tests is I think the cause of the issue. The duration 
of the scheduling cycle is short and also cut even shorter when nothing needs 
to be done. We probably _waste_ scheduling cycles because we have nothing to 
do. When events are later processed we have no scheduling cycles to progress.
   I am thinking about a better solution or even using continuous scheduling in 
the smoke tests.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to