-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34677/
-----------------------------------------------------------

Review request for Ambari, Robert Nettleton and Tom Beerbower.


Bugs: AMBARI-11394
    https://issues.apache.org/jira/browse/AMBARI-11394


Repository: ambari


Description
-------

Provisioning a cluster may occasionally fail to complete as a result of an out 
of order database write.
This error presents itself as start task(s) that never progresses beyond the 
PENDING state. For these logical pending tasks, there are no associated 
physical tasks.
When a host is matched to a host request, an install request is submitted 
followed immediately by a start request. The install task transitions all host 
components desired_state for the host from INIT to INSTALLED. But, because of 
an error in the persistence layer, after the desired_state is set to INSTALLED, 
it is overwritten on another thread (heartbeat handler thread) to INIT. As a 
result, the component is never started because it it's desired state is INIT 
and isn't processed by the start operation.
The root cause of this is that the public method 
ServiceComponentHostImpl.handleEvent() is annotated with '@Transactional'. 
Inside of this method the proper locks are acquired, BUT because this method is 
marked as @Transactional it's invocation is wrapped in a proxy which wraps the 
method invocation in a transaction. As a result, the transaction is committed 
in the proxy after the method returns outside of any synchronization which 
allows for out of order writes.


Diffs
-----

  
ambari-server/src/main/java/org/apache/ambari/server/state/svccomphost/ServiceComponentHostImpl.java
 dd06eb5 

Diff: https://reviews.apache.org/r/34677/diff/


Testing
-------

- provisioned clusters via BP
- currently re-running unit test suite and will update with results prior to 
merging


Thanks,

John Speidel

Reply via email to