[
https://issues.apache.org/jira/browse/AURORA-470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14008005#comment-14008005
]
Bill Farner commented on AURORA-470:
------------------------------------
Is there anything interesting in scheduler logs around 20:53:47 (when the
throttling penalty should have expired)? Depending on the type of issue, you
might not see the task ID (e.g. if there was an NPE). You mentioned that there
was a scheduler failover around this time, so you'll want to look at logs on
the newly-elected leader to see if there's anything suspicious there.
The only commit since then that is possibly relevant is this guy:
https://reviews.apache.org/r/20066. So there's a chance this is already fixed,
but i'd love to know if you find anything in the logs before we consider
closing this.
If you do run into this again, restarting the scheduler should clear it up (a
band-aid only, obviously). Of course, please do let us know if you see it
again, especially on a more recent SHA.
{quote}
Another interesting thing to note is that the task reported as taking 989 msecs
to run actually took much longer
{quote}
Likely a red herring, at least as it applies to the THROTTLED behavior. Happy
to accept another ticket for that. Useful debugging info would be a screen
capture with the '+' clicked to show the scheduler's perspective of state
transitions. Output from grepping for a task ID on the scheduler and slave
would be super useful.
> Tasks get stuck in THROTTLED state on restart or leader change
> --------------------------------------------------------------
>
> Key: AURORA-470
> URL: https://issues.apache.org/jira/browse/AURORA-470
> Project: Aurora
> Issue Type: Story
> Components: Scheduler
> Affects Versions: 0.5.0
> Reporter: Nathan Howell
>
> We're seeing cases where tasks get stuck in the THROTTLED state indefinitely.
> From what I can tell from the logs, this happens if a task is throttled when
> Aurora is shutdown or a new leader is elected.
> It looks like the timer that changes the state from THROTTLED to PENDING is
> only setup on a transition to the THROTTLED state... it seems like there is
> no way to get these tasks running again except to restart them manually.
--
This message was sent by Atlassian JIRA
(v6.2#6252)