[ 
https://issues.apache.org/jira/browse/AMBARI-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Lysnichenko updated AMBARI-4324:
---------------------------------------

    Description: 
As of now, task timeout at server and timeout at agent are two different 
mechanisms, that currently work independently and duplicate each other. 

Such behaviour leads to strange scenario:
- cluster installation is started
- execution of some command exceeds timeout
- server considers this command and *all next* commands in request timed out. 
This state is shown at UI as well.
- at the same time, agent considers currently executed command timed out an 
kills it. After that, agent starts executing the next command in queue. If next 
commands does not fail, agent sends COMPLETE status reports.
- server receives  COMPLETE status reports and updates component status.
- if user clicks "Retry installation", only tasks for not installed components 
are created.
- as a result, UI shows less tasks than user expects

h1. Will be fixed by AMBARI-4323

  was:
h1. AMBARI-4323

-As of now, task timeout at server and timeout at agent are two different 
mechanisms, that currently work independently and duplicate each other. 

Such behaviour leads to strange scenario:
- cluster installation is started
- execution of some command exceeds timeout
- server considers this command and *all next* commands in request timed out. 
This state is shown at UI as well.
- at the same time, agent considers currently executed command timed out an 
kills it. After that, agent starts executing the next command in queue. If next 
commands does not fail, agent sends COMPLETE status reports.
- server receives  COMPLETE status reports and updates component status.
- if user clicks "Retry installation", only tasks for not installed components 
are created.
- as a result, UI shows less tasks than user expects

Changes in scope of this jira:
add TIMEDOUT command status report type at agent. At the server side, 
HostRoleStatus enum already has this status type. Modify server behaviour: 
server considers a task timed out when it receives appropriate command report 
from the agent. In this case, all task time tracking logic is consolidated at 
agent. Doing that will simplify timeout handling for CustomCommands and 
CustomActions.

Some issues may occur when agent host goes down and therefore does not send any 
command reports. Server should have some handling for such case .-




> Server should rely on command reports when considering tasks timed out
> ----------------------------------------------------------------------
>
>                 Key: AMBARI-4324
>                 URL: https://issues.apache.org/jira/browse/AMBARI-4324
>             Project: Ambari
>          Issue Type: Improvement
>          Components: agent, controller
>    Affects Versions: 1.5.0
>            Reporter: Dmitry Lysnichenko
>            Assignee: Dmitry Lysnichenko
>             Fix For: 1.5.0
>
>
> As of now, task timeout at server and timeout at agent are two different 
> mechanisms, that currently work independently and duplicate each other. 
> Such behaviour leads to strange scenario:
> - cluster installation is started
> - execution of some command exceeds timeout
> - server considers this command and *all next* commands in request timed out. 
> This state is shown at UI as well.
> - at the same time, agent considers currently executed command timed out an 
> kills it. After that, agent starts executing the next command in queue. If 
> next commands does not fail, agent sends COMPLETE status reports.
> - server receives  COMPLETE status reports and updates component status.
> - if user clicks "Retry installation", only tasks for not installed 
> components are created.
> - as a result, UI shows less tasks than user expects
> h1. Will be fixed by AMBARI-4323



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to