[ 
https://issues.apache.org/jira/browse/AURORA-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14723643#comment-14723643
 ] 

Bill Farner commented on AURORA-1459:
-------------------------------------

Good catch!  I believe this is also affecting some other parts of the system.  
For example, if you use {{./gradlew run}}, the scheduler aborts due to not 
being registered quickly enough.  I'm pretty sure this is caused by the same 
issue.

> DelayExecutor is flaky within scheduling loop
> ---------------------------------------------
>
>                 Key: AURORA-1459
>                 URL: https://issues.apache.org/jira/browse/AURORA-1459
>             Project: Aurora
>          Issue Type: Bug
>          Components: Scheduler
>            Reporter: Maxim Khutornenko
>
> TaskGroups now uses DelayExecutor introduced to gate async operations. The 
> problem though is that DelayExecutor queue is only flushed on DB transaction 
> completion (1). This means no scheduling can ever proceed unless there is 
> _some_ storage mutation activity. If/when there are no storage writes 
> scheduling effectively halts. 
> While it unlikely to happen in production, it is consistently reproducible 
> with e2e tests in vagrant on any subsequent run.
> (1) - 
> https://github.com/apache/aurora/blob/06ddaadbcba4c66b8019815de6ca27d50a9df77d/src/main/java/org/apache/aurora/scheduler/storage/db/DbStorage.java#L175-L178



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to