Maxim Khutornenko created AURORA-1459:
-----------------------------------------

             Summary: DelayExecutor is flaky within scheduling loop
                 Key: AURORA-1459
                 URL: https://issues.apache.org/jira/browse/AURORA-1459
             Project: Aurora
          Issue Type: Bug
          Components: Scheduler
            Reporter: Maxim Khutornenko


TaskGroups now uses DelayExecutor introduced to gate async operations. The 
problem though is that DelayExecutor queue is only flushed on DB transaction 
completion (1). This means no scheduling can ever proceed unless there is 
_some_ storage mutation activity. If/when there are no storage writes 
scheduling effectively halts. 

While it unlikely to happen in production, it is consistently reproducible with 
e2e tests in vagrant on any subsequent run.


(1) - 
https://github.com/apache/aurora/blob/06ddaadbcba4c66b8019815de6ca27d50a9df77d/src/main/java/org/apache/aurora/scheduler/storage/db/DbStorage.java#L175-L178



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to