Re: Review Request 43967: Express Upgrade Stuck At Manual Prompt Due To HRC Status Calculation Cache Problem

Jonathan Hurley Fri, 26 Feb 2016 09:03:45 -0800

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43967/
-----------------------------------------------------------


(Updated Feb. 26, 2016, 12:02 p.m.)


Review request for Ambari, Alejandro Fernandez, Nate Cole, Sumit Mohanty, 
Sebastian Toader, and Sid Wagle.


Changes
-------

So, my fear turned out to be valid; there is potentially an "outer transaction" 
which eclipses the transaction we're trying to lock around.

```
@Transactional
public void foo(){
  HostRoleCommandDAO.bar();
}

@Transactional
@TransactionalLock
HostRoleCommandDao.bar() {}
```

Because the foo() method is transactional, a transaction is started before the 
method we decorated is called. Yes, we could walk backward and try to find all 
invocations and decorate them too with the TransactionaLock, but this approach 
is brittle. A future change could easily add a new invocation of bar() from a 
transactiona and the cache would begin failing again with no obvious reason why.

This new solution basically builds the work into the existing interceptor. If 
during the course of the thread's traversal through the stack, it encounters a 
TransactionalLock, it will lock on it, but it won't release it until the outer 
transaction is committed. Here's the workflow:

```
fooInterceptor
  fooTransaction.begin
    fooTransaction.proceed 
      mergeInterceptor
        lock
        proceed (no new transaction)
    fooTransaction.commit
  unlock
```

Essentially any TransactionalLocks are locked during the Jointpoint.proceed(), 
and only released onces the transaction has committed. Because it's the same 
thread doing all of the work, re-entrancy is not an issue.


Bugs: AMBARI-15173
    https://issues.apache.org/jira/browse/AMBARI-15173


Repository: ambari


Description
-------

Seen while performing an upgrade, it's possible that the status of a 
request/stage does not match that of its tasks. Essentially, the task could be 
{{HOLDING}} while the request is still {{IN_PROGRESS}}.

I believe that AMBARI-15011 is responsible for this issue. AMBARI-15011 
introduced, among other things, a cache to the 
{{HostRoleCommandStatusSummaryDTO}} which is a aggregation of the number of 
tasks a stage has in each state (PENDING, HOLDING, etc).

This {{HostRoleCommandStatusSummaryDTO}} is used by {{CalculatedState}} to 
calculate a stage's and request's status based on the tasks. 

The problem is that {{ServerActionExecutor}} is moving a tasks's state to 
{{HOLDING}} (reflected in the database correctly) but the cache invalidation 
happens inside the uncommitted transaction. This causes stale data to be 
re-cached. So, when we go to calculate the request and state status, we get 
{{IN_PROGRESS}} instead of {{HOLDING}}.

{code}
{
  "href": 
"http://172.22.72.13:8080/api/v1/clusters/cl1/requests/61/stages/1?fields=*,tasks/*";,
  "Stage": {
    "cluster_name": "cl1",
    "context": "Stop YARN Queues",
    "display_status": "IN_PROGRESS",
    "end_time": -1,
    "progress_percent": 35,
    "request_id": 61,
    "skippable": true,
    "stage_id": 1,
    "start_time": 1456227329191,
    "status": "IN_PROGRESS"
  },
  "tasks": [
    {
      "href": 
"http://172.22.72.13:8080/api/v1/clusters/cl1/requests/61/stages/1/tasks/754";,
      "Tasks": {
        "attempt_cnt": 1,
        "cluster_name": "cl1",
        "command": "EXECUTE",
        "command_detail": "Before continuing, please stop all YARN queues. If 
yarn-site's yarn.resourcemanager.work-preserving-recovery.enabled is set to 
true, then you can skip this step since the clients will retry on their own.",
        "custom_command_name": 
"org.apache.ambari.server.serveraction.upgrades.ManualStageAction",
        "end_time": -1,
        "error_log": "errors-754.txt",
        "exit_code": 0,
        "host_name": "os-r6-mkqzcs-c10tom21unsecha-6.novalocal",
        "id": 754,
        "output_log": "output-754.txt",
        "request_id": 61,
        "role": "AMBARI_SERVER_ACTION",
        "stage_id": 1,
        "start_time": 1456227329191,
        "status": "HOLDING",
        "stderr": "",
        "stdout": "",
        "structured_out": {}
      }
    }
  ]
}
{code}


Diffs (updated)
-----

  ambari-web/app/styles/application.less 3a49d5c 

Diff: https://reviews.apache.org/r/43967/diff/


Testing
-------

Pending unit tests...


Thanks,

Jonathan Hurley

Re: Review Request 43967: Express Upgrade Stuck At Manual Prompt Due To HRC Status Calculation Cache Problem

Reply via email to