[ https://issues.apache.org/jira/browse/AURORA-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15668008#comment-15668008 ]
Stephan Erb commented on AURORA-1820: ------------------------------------- I am wondering: How did you find that and how do we know it is an actual problem? Launching a task requires storage locks at multiple different times and stages: the scheduling run, the save operation for the received status updates, the task timeout running. Why is the latter one special? It seems to be just one of many times a lock is acquired for a single task launch. For the actual idea and code change: sounds good to me :) > Reduce storage write lock contention by adopting Double-Checked Locking > pattern in TimedOutTaskHandler > ------------------------------------------------------------------------------------------------------ > > Key: AURORA-1820 > URL: https://issues.apache.org/jira/browse/AURORA-1820 > Project: Aurora > Issue Type: Task > Components: Efficiency, Scheduler > Reporter: Mehrdad Nurolahzade > Assignee: Mehrdad Nurolahzade > Priority: Critical > > {{TimedOutTaskHandler}} acquires storage write lock for every task every time > they transition to a transient state. It then verifies after a default > time-out period of 5 minutes if the task has transitioned out of the > transient state. > The verification step takes place while holding the storage write lock. In > over 99% of cases the logic short-circuits and returns from > {{StateManagerImpl.updateTaskAndExternalState()}} once it learns task has > transitioned out of the transient state. > Reduce storage write lock contention by adopting [Double-Checked > Locking|https://en.wikipedia.org/wiki/Double-checked_locking] pattern in > {{TimedOutTaskHandler.run()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)