> On May 26, 2015, 7:50 p.m., Robert Levas wrote: > > Ship It!
Forgot to add test results: Results : Tests run: 3011, Failures: 0, Errors: 0, Skipped: 21 ... ---------------------------------------------------------------------- Total run:743 Total errors:0 Total failures:0 - John ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34677/#review85239 ----------------------------------------------------------- On May 26, 2015, 7:42 p.m., John Speidel wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/34677/ > ----------------------------------------------------------- > > (Updated May 26, 2015, 7:42 p.m.) > > > Review request for Ambari, Robert Nettleton and Tom Beerbower. > > > Bugs: AMBARI-11394 > https://issues.apache.org/jira/browse/AMBARI-11394 > > > Repository: ambari > > > Description > ------- > > Provisioning a cluster may occasionally fail to complete as a result of an > out of order database write. > This error presents itself as start task(s) that never progresses beyond the > PENDING state. For these logical pending tasks, there are no associated > physical tasks. > When a host is matched to a host request, an install request is submitted > followed immediately by a start request. The install task transitions all > host components desired_state for the host from INIT to INSTALLED. But, > because of an error in the persistence layer, after the desired_state is set > to INSTALLED, it is overwritten on another thread (heartbeat handler thread) > to INIT. As a result, the component is never started because it it's desired > state is INIT and isn't processed by the start operation. > The root cause of this is that the public method > ServiceComponentHostImpl.handleEvent() is annotated with '@Transactional'. > Inside of this method the proper locks are acquired, BUT because this method > is marked as @Transactional it's invocation is wrapped in a proxy which wraps > the method invocation in a transaction. As a result, the transaction is > committed in the proxy after the method returns outside of any > synchronization which allows for out of order writes. > > > Diffs > ----- > > > ambari-server/src/main/java/org/apache/ambari/server/state/svccomphost/ServiceComponentHostImpl.java > dd06eb5 > > Diff: https://reviews.apache.org/r/34677/diff/ > > > Testing > ------- > > - provisioned clusters via BP > - currently re-running unit test suite and will update with results prior to > merging > > Because this is a timing issue which according to a user only occurs for them > once every ~150 clusters and I have been unable to reproduce, I wan't able to > verify that this patch completely fixes this issue. But, I can say with > certainty that this the issue that was fixed could manifest itself precisely > as the bug describes. > > > Thanks, > > John Speidel > >
