[
https://issues.apache.org/jira/browse/AURORA-608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076975#comment-14076975
]
Yan Xu commented on AURORA-608:
-------------------------------
I filed https://issues.apache.org/jira/browse/MESOS-1646 for this problem. I
think it's not necessarily a master perf issue. Master in this case just reacts
with longer delay for scheduler requests. The sudden surge of TASK_LOSTs in
master are a result of both the delay caused by scheduler failover and the
unfortunate timing of GCExecutor seppuku. This won't be an issue if the
executor seppuku is initiate by the scheduler, so it knows not to send more
tasks to the dead executor instance or it knows to replace it with a new
instance if it needs to run the task immediately.
I understand that GCExecutor is going to be deprecated or play a smaller role
with master's reconciliation but will it go away completely or will this still
be an issue?
> GcExecutorLauncher should throttle initial activity spike
> ---------------------------------------------------------
>
> Key: AURORA-608
> URL: https://issues.apache.org/jira/browse/AURORA-608
> Project: Aurora
> Issue Type: Task
> Components: Scheduler
> Reporter: Maxim Khutornenko
> Assignee: Maxim Khutornenko
>
> The current implementation of the GcExecutorLauncher randomizes the GC
> activity by spreading different host GC execution over the hour. It does not,
> however, protect from the startup spike of accepted GC offers before the host
> cache is populated. This proved to be a perf problem for Mesos master under
> certain conditions.
--
This message was sent by Atlassian JIRA
(v6.2#6252)