-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43967/
-----------------------------------------------------------

Review request for Ambari, Alejandro Fernandez, Nate Cole, Sumit Mohanty, 
Sebastian Toader, and Sid Wagle.


Bugs: AMBARI-15173
    https://issues.apache.org/jira/browse/AMBARI-15173


Repository: ambari


Description
-------

Seen while performing an upgrade, it's possible that the status of a 
request/stage does not match that of its tasks. Essentially, the task could be 
{{HOLDING}} while the request is still {{IN_PROGRESS}}.

I believe that AMBARI-15011 is responsible for this issue. AMBARI-15011 
introduced, among other things, a cache to the 
{{HostRoleCommandStatusSummaryDTO}} which is a aggregation of the number of 
tasks a stage has in each state (PENDING, HOLDING, etc).

This {{HostRoleCommandStatusSummaryDTO}} is used by {{CalculatedState}} to 
calculate a stage's and request's status based on the tasks. 

The problem is that {{ServerActionExecutor}} is moving a tasks's state to 
{{HOLDING}} (reflected in the database correctly) but the cache invalidation 
happens inside the uncommitted transaction. This causes stale data to be 
re-cached. So, when we go to calculate the request and state status, we get 
{{IN_PROGRESS}} instead of {{HOLDING}}.

{code}
{
  "href": 
"http://172.22.72.13:8080/api/v1/clusters/cl1/requests/61/stages/1?fields=*,tasks/*";,
  "Stage": {
    "cluster_name": "cl1",
    "context": "Stop YARN Queues",
    "display_status": "IN_PROGRESS",
    "end_time": -1,
    "progress_percent": 35,
    "request_id": 61,
    "skippable": true,
    "stage_id": 1,
    "start_time": 1456227329191,
    "status": "IN_PROGRESS"
  },
  "tasks": [
    {
      "href": 
"http://172.22.72.13:8080/api/v1/clusters/cl1/requests/61/stages/1/tasks/754";,
      "Tasks": {
        "attempt_cnt": 1,
        "cluster_name": "cl1",
        "command": "EXECUTE",
        "command_detail": "Before continuing, please stop all YARN queues. If 
yarn-site's yarn.resourcemanager.work-preserving-recovery.enabled is set to 
true, then you can skip this step since the clients will retry on their own.",
        "custom_command_name": 
"org.apache.ambari.server.serveraction.upgrades.ManualStageAction",
        "end_time": -1,
        "error_log": "errors-754.txt",
        "exit_code": 0,
        "host_name": "os-r6-mkqzcs-c10tom21unsecha-6.novalocal",
        "id": 754,
        "output_log": "output-754.txt",
        "request_id": 61,
        "role": "AMBARI_SERVER_ACTION",
        "stage_id": 1,
        "start_time": 1456227329191,
        "status": "HOLDING",
        "stderr": "",
        "stdout": "",
        "structured_out": {}
      }
    }
  ]
}
{code}


Diffs
-----

  
ambari-server/src/main/java/com/google/inject/persist/jpa/AmbariJpaPersistModule.java
 4e4dd35 
  
ambari-server/src/main/java/org/apache/ambari/annotations/TransactionalLock.java
 PRE-CREATION 
  
ambari-server/src/main/java/org/apache/ambari/server/orm/AmbariJpaLocalTxnInterceptor.java
 6d7901c 
  
ambari-server/src/main/java/org/apache/ambari/server/orm/TransactionalLockInterceptor.java
 PRE-CREATION 
  
ambari-server/src/main/java/org/apache/ambari/server/orm/TransactionalLocks.java
 PRE-CREATION 
  
ambari-server/src/main/java/org/apache/ambari/server/orm/dao/HostRoleCommandDAO.java
 deca9b1 
  ambari-server/src/main/resources/stacks/HDP/2.3/upgrades/upgrade-2.4.xml 
29ebc1f 

Diff: https://reviews.apache.org/r/43967/diff/


Testing
-------

Pending unit tests...


Thanks,

Jonathan Hurley

Reply via email to