[ 
https://issues.apache.org/jira/browse/AURORA-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15668008#comment-15668008
 ] 

Stephan Erb commented on AURORA-1820:
-------------------------------------

I am wondering: How did you find that and how do we know it is an actual 
problem? 

Launching a task requires storage locks at multiple different times and stages: 
the scheduling run, the save operation for the received status updates, the 
task timeout running. Why is the latter one special? It seems to be just one of 
many times a lock is acquired for a single task launch.

For the actual idea and code change: sounds good to me :)

> Reduce storage write lock contention by adopting Double-Checked Locking 
> pattern in TimedOutTaskHandler
> ------------------------------------------------------------------------------------------------------
>
>                 Key: AURORA-1820
>                 URL: https://issues.apache.org/jira/browse/AURORA-1820
>             Project: Aurora
>          Issue Type: Task
>          Components: Efficiency, Scheduler
>            Reporter: Mehrdad Nurolahzade
>            Assignee: Mehrdad Nurolahzade
>            Priority: Critical
>
> {{TimedOutTaskHandler}} acquires storage write lock for every task every time 
> they transition to a transient state. It then verifies after a default 
> time-out period of 5 minutes if the task has transitioned out of the 
> transient state. 
> The verification step takes place while holding the storage write lock. In 
> over 99% of cases the logic short-circuits and returns from 
> {{StateManagerImpl.updateTaskAndExternalState()}} once it learns task has 
> transitioned out of the transient state.
> Reduce storage write lock contention by adopting [Double-Checked 
> Locking|https://en.wikipedia.org/wiki/Double-checked_locking] pattern in 
> {{TimedOutTaskHandler.run()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to