-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43967/
-----------------------------------------------------------

(Updated Feb. 29, 2016, 12:59 p.m.)


Review request for Ambari, Alejandro Fernandez, Nate Cole, Sumit Mohanty, 
Sebastian Toader, and Sid Wagle.


Changes
-------

3rd time's the charm, right? 

Finally reproduced it (again). Different problem, actually; same basic area.
In this case, what's happening is we have multiple cache invalidations 
occurring in between when the LoadingCache is invoked and when Guava finally 
sets the cached value. Because there is not a 1:1 relationship between 
invalidation and cache reload, what happens is that, even though we have 2 
invalidations of the cache, only the first repopulated item makes it in. It's 
never told to invalidate that item, and fetch the newest data as a result of 
the 2nd invalidation.

Additionally, there was a race condition to get the lock in the first place, 
which has been fixed by pre-initializing all locks when the singleton is 
injected.


Bugs: AMBARI-15173
    https://issues.apache.org/jira/browse/AMBARI-15173


Repository: ambari


Description
-------

Seen while performing an upgrade, it's possible that the status of a 
request/stage does not match that of its tasks. Essentially, the task could be 
{{HOLDING}} while the request is still {{IN_PROGRESS}}.

I believe that AMBARI-15011 is responsible for this issue. AMBARI-15011 
introduced, among other things, a cache to the 
{{HostRoleCommandStatusSummaryDTO}} which is a aggregation of the number of 
tasks a stage has in each state (PENDING, HOLDING, etc).

This {{HostRoleCommandStatusSummaryDTO}} is used by {{CalculatedState}} to 
calculate a stage's and request's status based on the tasks. 

The problem is that {{ServerActionExecutor}} is moving a tasks's state to 
{{HOLDING}} (reflected in the database correctly) but the cache invalidation 
happens inside the uncommitted transaction. This causes stale data to be 
re-cached. So, when we go to calculate the request and state status, we get 
{{IN_PROGRESS}} instead of {{HOLDING}}.

{code}
{
  "href": 
"http://172.22.72.13:8080/api/v1/clusters/cl1/requests/61/stages/1?fields=*,tasks/*";,
  "Stage": {
    "cluster_name": "cl1",
    "context": "Stop YARN Queues",
    "display_status": "IN_PROGRESS",
    "end_time": -1,
    "progress_percent": 35,
    "request_id": 61,
    "skippable": true,
    "stage_id": 1,
    "start_time": 1456227329191,
    "status": "IN_PROGRESS"
  },
  "tasks": [
    {
      "href": 
"http://172.22.72.13:8080/api/v1/clusters/cl1/requests/61/stages/1/tasks/754";,
      "Tasks": {
        "attempt_cnt": 1,
        "cluster_name": "cl1",
        "command": "EXECUTE",
        "command_detail": "Before continuing, please stop all YARN queues. If 
yarn-site's yarn.resourcemanager.work-preserving-recovery.enabled is set to 
true, then you can skip this step since the clients will retry on their own.",
        "custom_command_name": 
"org.apache.ambari.server.serveraction.upgrades.ManualStageAction",
        "end_time": -1,
        "error_log": "errors-754.txt",
        "exit_code": 0,
        "host_name": "os-r6-mkqzcs-c10tom21unsecha-6.novalocal",
        "id": 754,
        "output_log": "output-754.txt",
        "request_id": 61,
        "role": "AMBARI_SERVER_ACTION",
        "stage_id": 1,
        "start_time": 1456227329191,
        "status": "HOLDING",
        "stderr": "",
        "stdout": "",
        "structured_out": {}
      }
    }
  ]
}
{code}


Diffs (updated)
-----

  
ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessorImpl.java
 003e2e6 
  
ambari-server/src/main/java/org/apache/ambari/server/orm/AmbariJpaLocalTxnInterceptor.java
 b5442c2 
  
ambari-server/src/main/java/org/apache/ambari/server/orm/TransactionalLocks.java
 1768dd8 
  
ambari-server/src/main/java/org/apache/ambari/server/orm/dao/HostRoleCommandDAO.java
 c2ded2f 
  ambari-server/src/test/java/org/apache/ambari/annotations/LockAreaTest.java 
PRE-CREATION 
  
ambari-server/src/test/java/org/apache/ambari/annotations/TransactionalLockInterceptorTest.java
 6ebdc0b 
  
ambari-server/src/test/java/org/apache/ambari/annotations/TransactionalLockTest.java
 1862088 

Diff: https://reviews.apache.org/r/43967/diff/


Testing
-------

Pending unit tests...


Thanks,

Jonathan Hurley

Reply via email to