[jira] [Updated] (AURORA-1869) Investigate the status update processing overhead

Mehrdad Nurolahzade (JIRA) Wed, 21 Dec 2016 12:11:25 -0800

     [ 
https://issues.apache.org/jira/browse/AURORA-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mehrdad Nurolahzade updated AURORA-1869:
----------------------------------------
    Description: 
There is a peculiar similarity pattern between the number of task status update 
events received from Mesos and the number of JVM threads started by the system 
([graphview|http://192.168.33.7:8081/graphview?query=rate(jvm_threads_started)%0Arate(scheduler_status_update_events)]).
 It seems like a new thread is started every time a status update event is 
processed.

{{TaskStatusHandlerImpl}} is a single-threaded service, therefore it should not 
instantiate new threads. Looking at status update reasons/results, the majority 
of status updates are associated with {{RECONCILIATION}} and should result in 
{{NOOP}}. Therefore, they should have no impact on the internal workers. The 
task state machine should short-circuit and return upon realizing that the 
task’s reported new state corresponds to scheduler’s view.

{code:title=TaskStateMachine.updateState()}
if (stateMachine.getState() == taskState) {
  return new TransitionResult(NOOP, ImmutableSet.of());
}
{code}

Given the volume of status update events received upon reconciliation this 
overhead needs to be avoided, if possible.

  was:
There is a peculiar similarity pattern between the number of task status update 
events received from Mesos and the number of JVM threads started by the system 
([graphview|http://192.168.33.7:8081/graphview?query=rate(jvm_threads_started)%0Arate(scheduler_status_update_events)]).
 It seems like a new thread is started every time a status update event is 
processed.

{{TaskStatusHandlerImpl}} is a singleton service, therefore it should not 
instantiate new threads. Looking at status update reasons/results, the majority 
of status updates are associated with {{RECONCILIATION}} and should result in 
{{NOOP}}. Therefore, they should have no impact on the internal workers. The 
task state machine should short-circuit and return upon realizing that the 
task’s reported new state corresponds to scheduler’s view.

{code:title=TaskStateMachine.updateState()}
if (stateMachine.getState() == taskState) {
  return new TransitionResult(NOOP, ImmutableSet.of());
}
{code}

Given the volume of status update events received upon reconciliation this 
overhead needs to be avoided, if possible.


> Investigate the status update processing overhead
> -------------------------------------------------
>
>                 Key: AURORA-1869
>                 URL: https://issues.apache.org/jira/browse/AURORA-1869
>             Project: Aurora
>          Issue Type: Task
>          Components: Scheduler
>            Reporter: Mehrdad Nurolahzade
>            Priority: Minor
>
> There is a peculiar similarity pattern between the number of task status 
> update events received from Mesos and the number of JVM threads started by 
> the system 
> ([graphview|http://192.168.33.7:8081/graphview?query=rate(jvm_threads_started)%0Arate(scheduler_status_update_events)]).
>  It seems like a new thread is started every time a status update event is 
> processed.
> {{TaskStatusHandlerImpl}} is a single-threaded service, therefore it should 
> not instantiate new threads. Looking at status update reasons/results, the 
> majority of status updates are associated with {{RECONCILIATION}} and should 
> result in {{NOOP}}. Therefore, they should have no impact on the internal 
> workers. The task state machine should short-circuit and return upon 
> realizing that the task’s reported new state corresponds to scheduler’s view.
> {code:title=TaskStateMachine.updateState()}
> if (stateMachine.getState() == taskState) {
>   return new TransitionResult(NOOP, ImmutableSet.of());
> }
> {code}
> Given the volume of status update events received upon reconciliation this 
> overhead needs to be avoided, if possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (AURORA-1869) Investigate the status update processing overhead

Reply via email to