Mehrdad Nurolahzade created AURORA-1869:
-------------------------------------------
Summary: Investigate the status update processing overhead
Key: AURORA-1869
URL: https://issues.apache.org/jira/browse/AURORA-1869
Project: Aurora
Issue Type: Task
Components: Scheduler
Reporter: Mehrdad Nurolahzade
Priority: Minor
There is a peculiar similarity pattern between the number of task status update
events received from Mesos and the number of JVM threads started by the system
([graphview|http://192.168.33.7:8081/graphview?query=rate(jvm_threads_started)%0Arate(scheduler_status_update_events)]).
It seems like a new thread is started every time a status update event is
processed.
{{TaskStatusHandlerImpl}} is a singleton service, therefore it should not
instantiate new threads. Looking at status update reasons/results, the majority
of status updates are associated with {{RECONCILIATION}} and should result in
{{NOOP}}. Therefore, they should have no impact on the internal workers. The
task state machine should short-circuit and return upon realizing that the
task’s reported new state corresponds to scheduler’s view.
{code:title=TaskStateMachine.updateState()}
if (stateMachine.getState() == taskState) {
return new TransitionResult(NOOP, ImmutableSet.of());
}
{code}
Given the volume of status update events received upon reconciliation this
overhead needs to be avoided, if possible.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)