[ 
https://issues.apache.org/jira/browse/AURORA-608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076975#comment-14076975
 ] 

Yan Xu commented on AURORA-608:
-------------------------------

I filed https://issues.apache.org/jira/browse/MESOS-1646 for this problem. I 
think it's not necessarily a master perf issue. Master in this case just reacts 
with longer delay for scheduler requests. The sudden surge of TASK_LOSTs in 
master are a result of both the delay caused by scheduler failover and the 
unfortunate timing of GCExecutor seppuku. This won't be an issue if the 
executor seppuku is initiate by the scheduler, so it knows not to send more 
tasks to the dead executor instance or it knows to replace it with a new 
instance if it needs to run the task immediately.

I understand that GCExecutor is going to be deprecated or play a smaller role 
with master's reconciliation but will it go away completely or will this still 
be an issue?

> GcExecutorLauncher should throttle initial activity spike
> ---------------------------------------------------------
>
>                 Key: AURORA-608
>                 URL: https://issues.apache.org/jira/browse/AURORA-608
>             Project: Aurora
>          Issue Type: Task
>          Components: Scheduler
>            Reporter: Maxim Khutornenko
>            Assignee: Maxim Khutornenko
>
> The current implementation of the GcExecutorLauncher randomizes the GC 
> activity by spreading different host GC execution over the hour. It does not, 
> however, protect from the startup spike of accepted GC offers before the host 
> cache is populated. This proved to be a perf problem for Mesos master under 
> certain conditions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to