Maxim Khutornenko created AURORA-1459:
-----------------------------------------
Summary: DelayExecutor is flaky within scheduling loop
Key: AURORA-1459
URL: https://issues.apache.org/jira/browse/AURORA-1459
Project: Aurora
Issue Type: Bug
Components: Scheduler
Reporter: Maxim Khutornenko
TaskGroups now uses DelayExecutor introduced to gate async operations. The
problem though is that DelayExecutor queue is only flushed on DB transaction
completion (1). This means no scheduling can ever proceed unless there is
_some_ storage mutation activity. If/when there are no storage writes
scheduling effectively halts.
While it unlikely to happen in production, it is consistently reproducible with
e2e tests in vagrant on any subsequent run.
(1) -
https://github.com/apache/aurora/blob/06ddaadbcba4c66b8019815de6ca27d50a9df77d/src/main/java/org/apache/aurora/scheduler/storage/db/DbStorage.java#L175-L178
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)