[ 
https://issues.apache.org/jira/browse/AURORA-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737908#comment-14737908
 ] 

Maxim Khutornenko commented on AURORA-1486:
-------------------------------------------

Thanks for reporting! This is an interesting problem and points towards some 
kind of race condition between our task state management and async EventBus 
execution. The way I read it is that the ancestor (a flapped task) somehow 
still stays at ASSIGNED when the flapping delay for its replacement is 
calculated. [~wfarner], any insight how this could happen?

> Updater hangs forever if slave removed during update
> ----------------------------------------------------
>
>                 Key: AURORA-1486
>                 URL: https://issues.apache.org/jira/browse/AURORA-1486
>             Project: Aurora
>          Issue Type: Bug
>          Components: Scheduler
>    Affects Versions: 0.9.0
>            Reporter: George Sirois
>
> We have encountered several cases of server-side updates hanging indefinitely 
> if a slave is removed during the update.
> In Completed Tasks, you will generally see several consecutive LOST messages, 
> while the status of the task will show as THROTTLED forever:
> Completed Tasks:
> {code}
> 3 hours ago - LOST : Slave ec2-xx-xx-xx-xx.compute-1.amazonaws.com removed
> 09/09 17:18:20 LOCAL • THROTTLED • Rescheduled, penalized for 30000 ms for 
> flapping
> 09/09 17:19:20 LOCAL • PENDING
> 09/09 17:19:20 LOCAL • ASSIGNED
> 09/09 17:19:48 LOCAL • LOST • Slave ec2-xx-xx-xx-xx.compute-1.amazonaws.com 
> removed
> {code}
> Status:
> {code}
> 3 hours ago - THROTTLED : Rescheduled, penalized for 60000 ms for flapping
> 09/09 17:19:48 LOCAL • THROTTLED • Rescheduled, penalized for 60000 ms for 
> flapping
> {code}
> The full scheduler log is available here: 
> https://gist.github.com/GeorgeSirois/021a22dae6f2544b188c
> We are running a custom build based on 0.9.0: 
> https://github.com/tellapart/aurora/commits/tellapart (SHA BC87D76)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to