Maxim Khutornenko created AURORA-686:
----------------------------------------

             Summary: Job updates may fail due to exceeding role quota
                 Key: AURORA-686
                 URL: https://issues.apache.org/jira/browse/AURORA-686
             Project: Aurora
          Issue Type: Story
          Components: Scheduler
            Reporter: Maxim Khutornenko


Current way of checking job quota during in-flight updates (i.e. within 
addInstance transaction) may lead to failed updates and inferior user 
experience. Since we are tracking quota at the role level but the update lock 
applied at the job level, there is always a possibility to exceed the allowed 
quota for long running updates. 

This is especially a problem with the server side-driven process where a 
resumed update will restart in a potentially quite different quota environment 
(i.e. due to other jobs created while the update was paused). 

Possible solutions:
- per job quota tracking - requires significant refactoring;
- hierarchical locking (e.g. add role lock in addition to job lock) - limits 
update concurrency per role;
- front-loaded consumption (e.g. add additional job consumption during 
startJobUpdate and re-evaluate on update completion/termination) - will require 
persisting front-loaded value within job update schema but may be the way to go 
given current quota implementation.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to