[
https://issues.apache.org/jira/browse/AMBARI-12526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14640779#comment-14640779
]
Hadoop QA commented on AMBARI-12526:
------------------------------------
{color:green}+1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12747021/AMBARI-12526.patch
against trunk revision .
{color:green}+1 @author{color}. The patch does not contain any @author
tags.
{color:green}+1 tests included{color}. The patch appears to include 1 new
or modified test files.
{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.
{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.
{color:green}+1 core tests{color}. The patch passed unit tests in
ambari-server.
Test results:
https://builds.apache.org/job/Ambari-trunk-test-patch/3463//testReport/
Console output:
https://builds.apache.org/job/Ambari-trunk-test-patch/3463//console
This message is automatically generated.
> Ambari Cluster Deployment Stuck At 2% With A SQL Deadlock When Talking to SQL
> Azure
> -----------------------------------------------------------------------------------
>
> Key: AMBARI-12526
> URL: https://issues.apache.org/jira/browse/AMBARI-12526
> Project: Ambari
> Issue Type: Bug
> Components: ambari-server
> Affects Versions: 2.1.0
> Reporter: Jonathan Hurley
> Assignee: Jonathan Hurley
> Priority: Blocker
> Fix For: 2.1.1
>
> Attachments: AMBARI-12526.patch
>
>
> When deploying a new cluster on SQL Azure, there is a recurring deadlock on
> the SQL Server:
> {code}
> 15 Jul 2015 22:13:31,453 ERROR [ambari-action-scheduler]
> AmbariJpaLocalTxnInterceptor:114 - [DETAILED ERROR] Rollback reason:
> Local Exception Stack:
> Exception [EclipseLink-4002] (Eclipse Persistence Services -
> 2.5.2.v20140319-9ad6abd): org.eclipse.persistence.exceptions.DatabaseException
> Internal Exception: com.microsoft.sqlserver.jdbc.SQLServerException:
> Transaction (Process ID 62) was deadlocked on lock resources with another
> process and has been chosen as the deadlock victim. Rerun the transaction.
> Error Code: 1205
> Call: UPDATE hostcomponentstate SET current_state = ? WHERE
> ((((component_name = ?) AND (host_id = ?)) AND (cluster_id = ?)) AND
> (service_name = ?))
> bind => [5 parameters bound]
> at
> org.eclipse.persistence.exceptions.DatabaseException.sqlException(DatabaseException.java:331)
> at
> org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.executeDirectNoSelect(DatabaseAccessor.java:900)
> at
> org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.executeNoSelect(DatabaseAccessor.java:962)
> at
> org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.basicExecuteCall(DatabaseAccessor.java:631)
> at
> org.eclipse.persistence.internal.databaseaccess.ParameterizedSQLBatchWritingMechanism.executeBatch(ParameterizedSQLBatchWritingMechanism.java:149)
> at
> org.eclipse.persistence.internal.databaseaccess.ParameterizedSQLBatchWritingMechanism.executeBatchedStatements(ParameterizedSQLBatchWritingMechanism.java:134)
> at
> org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.writesCompleted(DatabaseAccessor.java:1836)
> at
> org.eclipse.persistence.internal.sessions.AbstractSession.writesCompleted(AbstractSession.java:4244)
> at
> org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.writesCompleted(UnitOfWorkImpl.java:5594)
> at
> org.eclipse.persistence.internal.sessions.RepeatableWriteUnitOfWork.writeChanges(RepeatableWriteUnitOfWork.java:453)
> at
> org.eclipse.persistence.internal.jpa.EntityManagerImpl.flush(EntityManagerImpl.java:863)
> at
> org.eclipse.persistence.internal.jpa.QueryImpl.performPreQueryFlush(QueryImpl.java:963)
> at
> org.eclipse.persistence.internal.jpa.QueryImpl.executeReadQuery(QueryImpl.java:207)
> at
> org.eclipse.persistence.internal.jpa.QueryImpl.getSingleResult(QueryImpl.java:517)
> at
> org.eclipse.persistence.internal.jpa.EJBQueryImpl.getSingleResult(EJBQueryImpl.java:400)
> at org.apache.ambari.server.orm.dao.DaoUtils.selectOne(DaoUtils.java:80)
> at org.apache.ambari.server.orm.dao.StackDAO.find(StackDAO.java:93)
> at
> org.apache.ambari.server.orm.AmbariLocalSessionInterceptor.invoke(AmbariLocalSessionInterceptor.java:53)
> at
> org.apache.ambari.server.state.svccomphost.ServiceComponentHostImpl.setStackVersion(ServiceComponentHostImpl.java:1058)
> at
> org.apache.ambari.server.state.svccomphost.ServiceComponentHostImpl$ServiceComponentHostOpStartedTransition.transition(ServiceComponentHostImpl.java:628)
> at
> org.apache.ambari.server.state.svccomphost.ServiceComponentHostImpl$ServiceComponentHostOpStartedTransition.transition(ServiceComponentHostImpl.java:610)
> at
> org.apache.ambari.server.state.fsm.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:354)
> at
> org.apache.ambari.server.state.fsm.StateMachineFactory.doTransition(StateMachineFactory.java:294)
> at
> org.apache.ambari.server.state.fsm.StateMachineFactory.access$300(StateMachineFactory.java:39)
> at
> org.apache.ambari.server.state.fsm.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:440)
> at
> org.apache.ambari.server.state.svccomphost.ServiceComponentHostImpl.handleEvent(ServiceComponentHostImpl.java:901)
> at
> org.apache.ambari.server.state.cluster.ClusterImpl.processServiceComponentHostEvents(ClusterImpl.java:2508)
> at
> org.apache.ambari.server.orm.AmbariJpaLocalTxnInterceptor.invoke(AmbariJpaLocalTxnInterceptor.java:68)
> at
> org.apache.ambari.server.actionmanager.ActionScheduler.doWork(ActionScheduler.java:343)
> at
> org.apache.ambari.server.actionmanager.ActionScheduler.run(ActionScheduler.java:195)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> It's just a JPA entity merge that's doing this. The whole transaction is:
> {code}
> UPDATE hostcomponentstate SET version = ? WHERE ((((component_name = ?) AND
> (host_id = ?)) AND (cluster_id = ?)) AND (service_name = ?))
> {code}
> There is a CLUSTERED index on {{component_name}}, {{host_id}},
> {{cluster_id}}, and {{service_name}}, so the predicate of the query caused
> the Clustered Index Seek (that's good since that causes a single lock). The
> {{UPDATE}} however then causes the Clustered Index Update (which is a
> {{DELETE}} followed by an {{INSERT}}, no?)
> Essentially, we have concurrent {{UPDATE}} statements in separate
> transactions acting on different rows of {{hostcomponentstate}}. This seems
> to cause a deadlock because both processes have an X lock and then try to
> acquire a U lock. The U lock is what is making me think they are trying to
> acquire the table lock in order to update the cluster index.
> In any event, it seems like the deadlock is caused by the escalation of a
> lock (whether it's a page or table); both processes hold row level key locks
> which are exclusive and then they both request a U lock which blocks for both.
> Below is the deadlock graph XML:
> {code}
> <?xml version="1.0" encoding="UTF-8"?>
> <deadlock-list>
> <deadlock victim="process10faf6ca8">
> <process-list>
> <process id="process10faf6ca8" taskpriority="0" logused="344"
> waitresource="KEY: 5:72057594165723136 (fd5c95c6a91a)" waittime="4391"
> ownerId="16290998" transactionname="implicit_transaction"
> lasttranstarted="2015-07-22T00:18:01.547" XDES="0x1177363b0" lockMode="U"
> schedulerid="2" kpid="3324" status="suspended" spid="54" sbid="0" ecid="0"
> priority="0" trancount="2" lastbatchstarted="2015-07-22T00:18:01.547"
> lastbatchcompleted="2015-07-22T00:18:01.547"
> lastattention="1900-01-01T00:00:00.547" clientapp="Microsoft JDBC Driver for
> SQL Server" hostname="headnode0" hostpid="0"
> loginname="[email protected]" isolationlevel="read
> committed (2)" xactid="16290998" currentdb="5" lockTimeout="4294967295"
> clientoption1="671088672" clientoption2="128058">
> <executionStack>
> <frame procname="adhoc" line="1" stmtstart="160" stmtend="450"
> sqlhandle="0x02000000b5fa61048c97ed65c734e84590264b91bbf93aca0000000000000000000000000000000000000000">unknown</frame>
> <frame procname="unknown" line="1"
> sqlhandle="0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000">unknown</frame>
> </executionStack>
> <inputbuf>(@P0 nvarchar(4000),@P1 nvarchar(4000),@P2 bigint,@P3
> bigint,@P4 nvarchar(4000))UPDATE hostcomponentstate SET version = @P0 WHERE
> ((((component_name = @P1) AND (host_id = @P2)) AND (cluster_id = @P3)) AND
> (service_name = @P4))</inputbuf>
> </process>
> <process id="process10dfa1c28" taskpriority="0" logused="520"
> waitresource="KEY: 5:72057594165723136 (930b38dc45e3)" waittime="4384"
> ownerId="16290959" transactionname="implicit_transaction"
> lasttranstarted="2015-07-22T00:18:01.537" XDES="0x1199b43b0" lockMode="U"
> schedulerid="1" kpid="3364" status="suspended" spid="61" sbid="0" ecid="0"
> priority="0" trancount="2" lastbatchstarted="2015-07-22T00:18:01.553"
> lastbatchcompleted="2015-07-22T00:18:01.553"
> lastattention="1900-01-01T00:00:00.553" clientapp="Microsoft JDBC Driver for
> SQL Server" hostname="headnode0" hostpid="0"
> loginname="[email protected]" isolationlevel="read
> committed (2)" xactid="16290959" currentdb="5" lockTimeout="4294967295"
> clientoption1="671088672" clientoption2="128058">
> <executionStack>
> <frame procname="adhoc" line="1" stmtstart="160" stmtend="462"
> sqlhandle="0x02000000e94be81b7f4521ee46bc4614740d406162aad19d0000000000000000000000000000000000000000">unknown</frame>
> <frame procname="unknown" line="1"
> sqlhandle="0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000">unknown</frame>
> </executionStack>
> <inputbuf>(@P0 nvarchar(4000),@P1 nvarchar(4000),@P2 bigint,@P3
> bigint,@P4 nvarchar(4000))UPDATE hostcomponentstate SET current_state = @P0
> WHERE ((((component_name = @P1) AND (host_id = @P2)) AND (cluster_id = @P3))
> AND (service_name = @P4))</inputbuf>
> </process>
> </process-list>
> <resource-list>
> <keylock hobtid="72057594165723136" dbid="5"
> objectname="ambari.dbo.hostcomponentstate"
> indexname="PK__hostcomp__C72E1492ED078925" id="lock113b1de80" mode="X"
> associatedObjectId="72057594165723136">
> <owner-list>
> <owner id="process10dfa1c28" mode="X" />
> </owner-list>
> <waiter-list>
> <waiter id="process10faf6ca8" mode="U" requestType="wait" />
> </waiter-list>
> </keylock>
> <keylock hobtid="72057594165723136" dbid="5"
> objectname="ambari.dbo.hostcomponentstate"
> indexname="PK__hostcomp__C72E1492ED078925" id="lock110cf0b00" mode="X"
> associatedObjectId="72057594165723136">
> <owner-list>
> <owner id="process10faf6ca8" mode="X" />
> </owner-list>
> <waiter-list>
> <waiter id="process10dfa1c28" mode="U" requestType="wait" />
> </waiter-list>
> </keylock>
> </resource-list>
> </deadlock>
> </deadlock-list>
> {code}
> Both processes have an X lock on clustered index
> {{PK__hostcomp__C72E1492ED078925}} and then both try to obtain a U lock; this
> is because they are doing a Clustered Index Update and are requesting an
> escalation to a table lock. This is where the deadlock occurs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)