[
https://issues.apache.org/jira/browse/AMBARI-17248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Toader updated AMBARI-17248:
--------------------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
> Reduce the idle time before first command from next stage is executed on a
> host
> -------------------------------------------------------------------------------
>
> Key: AMBARI-17248
> URL: https://issues.apache.org/jira/browse/AMBARI-17248
> Project: Ambari
> Issue Type: Improvement
> Components: ambari-agent, ambari-server
> Reporter: Sebastian Toader
> Assignee: Sebastian Toader
> Fix For: 2.4.0
>
> Attachments: AMBARI-17248.trunk.v7.patch
>
>
> Commands to be executed by ambari-agents are being sent down by the server in
> the response message to agent heartbeat messages.
> The server processes the received heartbeat, it checks if there are next
> commands scheduled to be executed by ambari-agent and adds those to the
> heartbeat response for the ambari-agent.
> The server organises the commands that can be executed in parallel into
> stages. Ambari server ensures that only the commands of a single stage is
> scheduled to be executed by the agent and starts scheduling the commands of
> the next stage only after all commands of current stage has finished
> successfully.
> The processing of command status received with the heartbeat message happens
> asynchronously to heartbeat response in HeartBeatProcessor and
> ActionScheduler creation thus when the heartbeat response is created the
> commands for the next stage are not scheduled yet. This means that the next
> commands will be sent to agent only with the next heartbeat.
> Agents currently sends a heartbeat to the server on command a completion or
> at a timeout = self.netutil.HEARTBEAT_IDDLE_INTERVAL_SEC –
> self.netutil.MINIMUM_INTERVAL_BETWEEN_HEARTBEATS interval which is ~10
> seconds if there are no commands to be executed.
> This means that when the server receives a heartbeat triggered by the
> completion of the last command from the current stage the server will send
> the commands for the next stage only 10 seconds later when the next heartbeat
> is received. This leads to agents spending considerable amount of time idle
> when there are multiple stages to be executed.
> Agents should heartbeat at a higher rate while there are still pending stages
> to be executed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)