John Speidel created AMBARI-11394:
-------------------------------------
Summary: Blueprint cluster provision occasionally fails due to out
of order database writes
Key: AMBARI-11394
URL: https://issues.apache.org/jira/browse/AMBARI-11394
Project: Ambari
Issue Type: Bug
Affects Versions: 2.1.0
Reporter: John Speidel
Assignee: John Speidel
Fix For: 2.1.0
Provisioning a cluster may occasionally fail to complete as a result of an out
of order database write.
This error presents itself as start task(s) that never progresses beyond the
PENDING state. For these logical pending tasks, there are no associated
physical tasks.
When a host is matched to a host request, an install request is submitted
followed immediately by a start request. The install task transitions all host
components desired_state for the host from INIT to INSTALLED. But, because of
an error in the persistence layer, after the desired_state is set to INSTALLED,
it is overwritten on another thread (heartbeat handler thread) to INIT. As a
result, the component is never started because it it's desired state is INIT
and isn't processed by the start operation.
The root cause of this is that the public method
ServiceComponentHostImpl.handleEvent() is annotated with '@Transactional'.
Inside of this method the proper locks are acquired, BUT because this method is
marked as @Transactional it's invocation is wrapped in a proxy which starts and
commits a transaction around the method. As a result, the transaction is
committed in the proxy outside of any synchronization which allows for out of
order writes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)