[
https://issues.apache.org/jira/browse/AURORA-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15668008#comment-15668008
]
Stephan Erb commented on AURORA-1820:
-------------------------------------
I am wondering: How did you find that and how do we know it is an actual
problem?
Launching a task requires storage locks at multiple different times and stages:
the scheduling run, the save operation for the received status updates, the
task timeout running. Why is the latter one special? It seems to be just one of
many times a lock is acquired for a single task launch.
For the actual idea and code change: sounds good to me :)
> Reduce storage write lock contention by adopting Double-Checked Locking
> pattern in TimedOutTaskHandler
> ------------------------------------------------------------------------------------------------------
>
> Key: AURORA-1820
> URL: https://issues.apache.org/jira/browse/AURORA-1820
> Project: Aurora
> Issue Type: Task
> Components: Efficiency, Scheduler
> Reporter: Mehrdad Nurolahzade
> Assignee: Mehrdad Nurolahzade
> Priority: Critical
>
> {{TimedOutTaskHandler}} acquires storage write lock for every task every time
> they transition to a transient state. It then verifies after a default
> time-out period of 5 minutes if the task has transitioned out of the
> transient state.
> The verification step takes place while holding the storage write lock. In
> over 99% of cases the logic short-circuits and returns from
> {{StateManagerImpl.updateTaskAndExternalState()}} once it learns task has
> transitioned out of the transient state.
> Reduce storage write lock contention by adopting [Double-Checked
> Locking|https://en.wikipedia.org/wiki/Double-checked_locking] pattern in
> {{TimedOutTaskHandler.run()}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)