[ 
https://issues.apache.org/jira/browse/AMBARI-17248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Toader updated AMBARI-17248:
--------------------------------------
    Fix Version/s:     (was: 2.5.0)
                       (was: 2.2-next)
                   2.4.0

> Reduce the idle time before first command from next stage is executed on a 
> host
> -------------------------------------------------------------------------------
>
>                 Key: AMBARI-17248
>                 URL: https://issues.apache.org/jira/browse/AMBARI-17248
>             Project: Ambari
>          Issue Type: Improvement
>          Components: ambari-agent, ambari-server
>            Reporter: Sebastian Toader
>            Assignee: Sebastian Toader
>             Fix For: 2.4.0
>
>         Attachments: AMBARI-17248.trunk.patch
>
>
> Commands to be executed by ambari-agents are being sent down by the server in 
> the response message to agent heartbeat messages. 
> The server processes the received heartbeat, it checks if there are next 
> commands scheduled to be executed by ambari-agent and adds those to the 
> heartbeat response for the ambari-agent.
> The server organises the commands that can be executed in parallel into 
> stages. Ambari server ensures that only the commands of a single stage is 
> scheduled to be executed by the agent and starts scheduling the commands of 
> the next stage only after all commands of current stage has finished 
> successfully.
> The processing of command status received with the heartbeat message happens 
> asynchronously to heartbeat response in HeartBeatProcessor and 
> ActionScheduler creation thus when the heartbeat response is created the 
> commands for the next stage are not scheduled yet. This means that the next 
> commands will be sent to agent only with the next heartbeat.
> Agents currently sends a heartbeat to the server on command a completion or 
> at a timeout = self.netutil.HEARTBEAT_IDDLE_INTERVAL_SEC – 
> self.netutil.MINIMUM_INTERVAL_BETWEEN_HEARTBEATS interval which is ~10 
> seconds if there are no commands to be executed.
> This means that when the server receives a heartbeat triggered by the 
> completion of the last command from the current stage the server will send 
> the commands for the next stage only 10 seconds later when the next heartbeat 
> is received. This leads to agents spending considerable amount of time idle 
> when there are multiple stages to be executed.
> Agents should heartbeat at a higher rate while there are still pending stages 
> to be executed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to