-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34677/#review85233
-----------------------------------------------------------

Ship it!


Ship It!

- Tom Beerbower


On May 26, 2015, 7:42 p.m., John Speidel wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34677/
> -----------------------------------------------------------
> 
> (Updated May 26, 2015, 7:42 p.m.)
> 
> 
> Review request for Ambari, Robert Nettleton and Tom Beerbower.
> 
> 
> Bugs: AMBARI-11394
>     https://issues.apache.org/jira/browse/AMBARI-11394
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> Provisioning a cluster may occasionally fail to complete as a result of an 
> out of order database write.
> This error presents itself as start task(s) that never progresses beyond the 
> PENDING state. For these logical pending tasks, there are no associated 
> physical tasks.
> When a host is matched to a host request, an install request is submitted 
> followed immediately by a start request. The install task transitions all 
> host components desired_state for the host from INIT to INSTALLED. But, 
> because of an error in the persistence layer, after the desired_state is set 
> to INSTALLED, it is overwritten on another thread (heartbeat handler thread) 
> to INIT. As a result, the component is never started because it it's desired 
> state is INIT and isn't processed by the start operation.
> The root cause of this is that the public method 
> ServiceComponentHostImpl.handleEvent() is annotated with '@Transactional'. 
> Inside of this method the proper locks are acquired, BUT because this method 
> is marked as @Transactional it's invocation is wrapped in a proxy which wraps 
> the method invocation in a transaction. As a result, the transaction is 
> committed in the proxy after the method returns outside of any 
> synchronization which allows for out of order writes.
> 
> 
> Diffs
> -----
> 
>   
> ambari-server/src/main/java/org/apache/ambari/server/state/svccomphost/ServiceComponentHostImpl.java
>  dd06eb5 
> 
> Diff: https://reviews.apache.org/r/34677/diff/
> 
> 
> Testing
> -------
> 
> - provisioned clusters via BP
> - currently re-running unit test suite and will update with results prior to 
> merging
> 
> Because this is a timing issue which according to a user only occurs for them 
> once every ~150 clusters and I have been unable to reproduce, I wan't able to 
> verify that this patch completely fixes this issue.  But, I can say with 
> certainty that this the issue that was fixed could manifest itself precisely 
> as the bug describes.
> 
> 
> Thanks,
> 
> John Speidel
> 
>

Reply via email to