> On Feb. 24, 2016, 7:22 p.m., Sid Wagle wrote:
> > Although I like the pattern here because it is re-usable, isn't this caused
> > by a missing Readlock on the cache re-load? Could this not have been
> > handled correctly with application level read/write locks? The reader which
> > loads the cache needs to wait on the writeLock and the Transaction boundary
> > is within the write lock to gurantee the transaction has committed already.
> > This does achieve the end result my only concern is introduciton of another
> > locking pattern and then un-intentional or mis-guided re-use by someone
> > else.
Thanks for the review, Sid. The problem is actually caused because of how the
@Transactional method wraps around the DAO method. Consider this case:
```
@Transactional
public void foo(Entity entity){
EntityManager.merge(entity)
cache.invalidate(entity)
}
```
On the surface, this looks fine. But what's actually happening is that the
cache is being invalidated before the transaction is committed. Now what
happens if a read comes in after the cache invalidation, but before the
@Transactional interceptor is able to commit the transaction? The cache will
get the old value read from the database and cached as new, which causes the
problem.
Even if I were to introduce a read lock in the cache, the above `foo()` method
would have to release its wrote lock before exiting. And that still happens
_before_ the transaction is committed. So, anything waiting on the read lock
would then be notified to start work, effectively reading a non-committed
transaction. It's the same problem.
- Jonathan
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43967/#review120624
-----------------------------------------------------------
On Feb. 24, 2016, 6:53 p.m., Jonathan Hurley wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43967/
> -----------------------------------------------------------
>
> (Updated Feb. 24, 2016, 6:53 p.m.)
>
>
> Review request for Ambari, Alejandro Fernandez, Nate Cole, Sumit Mohanty,
> Sebastian Toader, and Sid Wagle.
>
>
> Bugs: AMBARI-15173
> https://issues.apache.org/jira/browse/AMBARI-15173
>
>
> Repository: ambari
>
>
> Description
> -------
>
> Seen while performing an upgrade, it's possible that the status of a
> request/stage does not match that of its tasks. Essentially, the task could
> be {{HOLDING}} while the request is still {{IN_PROGRESS}}.
>
> I believe that AMBARI-15011 is responsible for this issue. AMBARI-15011
> introduced, among other things, a cache to the
> {{HostRoleCommandStatusSummaryDTO}} which is a aggregation of the number of
> tasks a stage has in each state (PENDING, HOLDING, etc).
>
> This {{HostRoleCommandStatusSummaryDTO}} is used by {{CalculatedState}} to
> calculate a stage's and request's status based on the tasks.
>
> The problem is that {{ServerActionExecutor}} is moving a tasks's state to
> {{HOLDING}} (reflected in the database correctly) but the cache invalidation
> happens inside the uncommitted transaction. This causes stale data to be
> re-cached. So, when we go to calculate the request and state status, we get
> {{IN_PROGRESS}} instead of {{HOLDING}}.
>
> {code}
> {
> "href":
> "http://172.22.72.13:8080/api/v1/clusters/cl1/requests/61/stages/1?fields=*,tasks/*",
> "Stage": {
> "cluster_name": "cl1",
> "context": "Stop YARN Queues",
> "display_status": "IN_PROGRESS",
> "end_time": -1,
> "progress_percent": 35,
> "request_id": 61,
> "skippable": true,
> "stage_id": 1,
> "start_time": 1456227329191,
> "status": "IN_PROGRESS"
> },
> "tasks": [
> {
> "href":
> "http://172.22.72.13:8080/api/v1/clusters/cl1/requests/61/stages/1/tasks/754",
> "Tasks": {
> "attempt_cnt": 1,
> "cluster_name": "cl1",
> "command": "EXECUTE",
> "command_detail": "Before continuing, please stop all YARN queues. If
> yarn-site's yarn.resourcemanager.work-preserving-recovery.enabled is set to
> true, then you can skip this step since the clients will retry on their own.",
> "custom_command_name":
> "org.apache.ambari.server.serveraction.upgrades.ManualStageAction",
> "end_time": -1,
> "error_log": "errors-754.txt",
> "exit_code": 0,
> "host_name": "os-r6-mkqzcs-c10tom21unsecha-6.novalocal",
> "id": 754,
> "output_log": "output-754.txt",
> "request_id": 61,
> "role": "AMBARI_SERVER_ACTION",
> "stage_id": 1,
> "start_time": 1456227329191,
> "status": "HOLDING",
> "stderr": "",
> "stdout": "",
> "structured_out": {}
> }
> }
> ]
> }
> {code}
>
>
> Diffs
> -----
>
>
> ambari-server/src/main/java/com/google/inject/persist/jpa/AmbariJpaPersistModule.java
> 4e4dd35
>
> ambari-server/src/main/java/org/apache/ambari/annotations/TransactionalLock.java
> PRE-CREATION
>
> ambari-server/src/main/java/org/apache/ambari/server/orm/AmbariJpaLocalTxnInterceptor.java
> 6d7901c
>
> ambari-server/src/main/java/org/apache/ambari/server/orm/TransactionalLockInterceptor.java
> PRE-CREATION
>
> ambari-server/src/main/java/org/apache/ambari/server/orm/TransactionalLocks.java
> PRE-CREATION
>
> ambari-server/src/main/java/org/apache/ambari/server/orm/dao/HostRoleCommandDAO.java
> deca9b1
> ambari-server/src/main/resources/stacks/HDP/2.3/upgrades/upgrade-2.4.xml
> 29ebc1f
>
> Diff: https://reviews.apache.org/r/43967/diff/
>
>
> Testing
> -------
>
> Pending unit tests...
>
>
> Thanks,
>
> Jonathan Hurley
>
>