[ 
https://issues.apache.org/jira/browse/AMBARI-12526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14640779#comment-14640779
 ] 

Hadoop QA commented on AMBARI-12526:
------------------------------------

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12747021/AMBARI-12526.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in 
ambari-server.

Test results: 
https://builds.apache.org/job/Ambari-trunk-test-patch/3463//testReport/
Console output: 
https://builds.apache.org/job/Ambari-trunk-test-patch/3463//console

This message is automatically generated.

> Ambari Cluster Deployment Stuck At 2% With A SQL Deadlock When Talking to SQL 
> Azure
> -----------------------------------------------------------------------------------
>
>                 Key: AMBARI-12526
>                 URL: https://issues.apache.org/jira/browse/AMBARI-12526
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 2.1.0
>            Reporter: Jonathan Hurley
>            Assignee: Jonathan Hurley
>            Priority: Blocker
>             Fix For: 2.1.1
>
>         Attachments: AMBARI-12526.patch
>
>
> When deploying a new cluster on SQL Azure, there is a recurring deadlock on 
> the SQL Server:
> {code}
> 15 Jul 2015 22:13:31,453 ERROR [ambari-action-scheduler] 
> AmbariJpaLocalTxnInterceptor:114 - [DETAILED ERROR] Rollback reason: 
> Local Exception Stack: 
> Exception [EclipseLink-4002] (Eclipse Persistence Services - 
> 2.5.2.v20140319-9ad6abd): org.eclipse.persistence.exceptions.DatabaseException
> Internal Exception: com.microsoft.sqlserver.jdbc.SQLServerException: 
> Transaction (Process ID 62) was deadlocked on lock resources with another 
> process and has been chosen as the deadlock victim. Rerun the transaction.
> Error Code: 1205
> Call: UPDATE hostcomponentstate SET current_state = ? WHERE 
> ((((component_name = ?) AND (host_id = ?)) AND (cluster_id = ?)) AND 
> (service_name = ?))
>       bind => [5 parameters bound]
>       at 
> org.eclipse.persistence.exceptions.DatabaseException.sqlException(DatabaseException.java:331)
>       at 
> org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.executeDirectNoSelect(DatabaseAccessor.java:900)
>       at 
> org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.executeNoSelect(DatabaseAccessor.java:962)
>       at 
> org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.basicExecuteCall(DatabaseAccessor.java:631)
>       at 
> org.eclipse.persistence.internal.databaseaccess.ParameterizedSQLBatchWritingMechanism.executeBatch(ParameterizedSQLBatchWritingMechanism.java:149)
>       at 
> org.eclipse.persistence.internal.databaseaccess.ParameterizedSQLBatchWritingMechanism.executeBatchedStatements(ParameterizedSQLBatchWritingMechanism.java:134)
>       at 
> org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.writesCompleted(DatabaseAccessor.java:1836)
>       at 
> org.eclipse.persistence.internal.sessions.AbstractSession.writesCompleted(AbstractSession.java:4244)
>       at 
> org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.writesCompleted(UnitOfWorkImpl.java:5594)
>       at 
> org.eclipse.persistence.internal.sessions.RepeatableWriteUnitOfWork.writeChanges(RepeatableWriteUnitOfWork.java:453)
>       at 
> org.eclipse.persistence.internal.jpa.EntityManagerImpl.flush(EntityManagerImpl.java:863)
>       at 
> org.eclipse.persistence.internal.jpa.QueryImpl.performPreQueryFlush(QueryImpl.java:963)
>       at 
> org.eclipse.persistence.internal.jpa.QueryImpl.executeReadQuery(QueryImpl.java:207)
>       at 
> org.eclipse.persistence.internal.jpa.QueryImpl.getSingleResult(QueryImpl.java:517)
>       at 
> org.eclipse.persistence.internal.jpa.EJBQueryImpl.getSingleResult(EJBQueryImpl.java:400)
>       at org.apache.ambari.server.orm.dao.DaoUtils.selectOne(DaoUtils.java:80)
>       at org.apache.ambari.server.orm.dao.StackDAO.find(StackDAO.java:93)
>       at 
> org.apache.ambari.server.orm.AmbariLocalSessionInterceptor.invoke(AmbariLocalSessionInterceptor.java:53)
>       at 
> org.apache.ambari.server.state.svccomphost.ServiceComponentHostImpl.setStackVersion(ServiceComponentHostImpl.java:1058)
>       at 
> org.apache.ambari.server.state.svccomphost.ServiceComponentHostImpl$ServiceComponentHostOpStartedTransition.transition(ServiceComponentHostImpl.java:628)
>       at 
> org.apache.ambari.server.state.svccomphost.ServiceComponentHostImpl$ServiceComponentHostOpStartedTransition.transition(ServiceComponentHostImpl.java:610)
>       at 
> org.apache.ambari.server.state.fsm.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:354)
>       at 
> org.apache.ambari.server.state.fsm.StateMachineFactory.doTransition(StateMachineFactory.java:294)
>       at 
> org.apache.ambari.server.state.fsm.StateMachineFactory.access$300(StateMachineFactory.java:39)
>       at 
> org.apache.ambari.server.state.fsm.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:440)
>       at 
> org.apache.ambari.server.state.svccomphost.ServiceComponentHostImpl.handleEvent(ServiceComponentHostImpl.java:901)
>       at 
> org.apache.ambari.server.state.cluster.ClusterImpl.processServiceComponentHostEvents(ClusterImpl.java:2508)
>       at 
> org.apache.ambari.server.orm.AmbariJpaLocalTxnInterceptor.invoke(AmbariJpaLocalTxnInterceptor.java:68)
>       at 
> org.apache.ambari.server.actionmanager.ActionScheduler.doWork(ActionScheduler.java:343)
>       at 
> org.apache.ambari.server.actionmanager.ActionScheduler.run(ActionScheduler.java:195)
>       at java.lang.Thread.run(Thread.java:745)
> {code}
> It's just a JPA entity merge that's doing this. The whole transaction is:
> {code}
> UPDATE hostcomponentstate SET version = ? WHERE ((((component_name = ?) AND 
> (host_id = ?)) AND (cluster_id = ?)) AND (service_name = ?))
> {code}
> There is a CLUSTERED index on {{component_name}}, {{host_id}}, 
> {{cluster_id}}, and {{service_name}}, so the predicate of the query caused 
> the Clustered Index Seek (that's good since that causes a single lock). The 
> {{UPDATE}} however then causes the Clustered Index Update (which is a 
> {{DELETE}} followed by an {{INSERT}}, no?)
> Essentially, we have concurrent {{UPDATE}} statements in separate 
> transactions acting on different rows of {{hostcomponentstate}}. This seems 
> to cause a deadlock because both processes have an X lock and then try to 
> acquire a U lock. The U lock is what is making me think they are trying to 
> acquire the table lock in order to update the cluster index.
> In any event, it seems like the deadlock is caused by the escalation of a 
> lock (whether it's a page or table); both processes hold row level key locks 
> which are exclusive and then they both request a U lock which blocks for both.
> Below is the deadlock graph XML:
> {code}
> <?xml version="1.0" encoding="UTF-8"?>
> <deadlock-list>
>   <deadlock victim="process10faf6ca8">
>     <process-list>
>       <process id="process10faf6ca8" taskpriority="0" logused="344" 
> waitresource="KEY: 5:72057594165723136 (fd5c95c6a91a)" waittime="4391" 
> ownerId="16290998" transactionname="implicit_transaction" 
> lasttranstarted="2015-07-22T00:18:01.547" XDES="0x1177363b0" lockMode="U" 
> schedulerid="2" kpid="3324" status="suspended" spid="54" sbid="0" ecid="0" 
> priority="0" trancount="2" lastbatchstarted="2015-07-22T00:18:01.547" 
> lastbatchcompleted="2015-07-22T00:18:01.547" 
> lastattention="1900-01-01T00:00:00.547" clientapp="Microsoft JDBC Driver for 
> SQL Server" hostname="headnode0" hostpid="0" 
> loginname="[email protected]" isolationlevel="read 
> committed (2)" xactid="16290998" currentdb="5" lockTimeout="4294967295" 
> clientoption1="671088672" clientoption2="128058">
>         <executionStack>
>           <frame procname="adhoc" line="1" stmtstart="160" stmtend="450" 
> sqlhandle="0x02000000b5fa61048c97ed65c734e84590264b91bbf93aca0000000000000000000000000000000000000000">unknown</frame>
>           <frame procname="unknown" line="1" 
> sqlhandle="0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000">unknown</frame>
>         </executionStack>
>         <inputbuf>(@P0 nvarchar(4000),@P1 nvarchar(4000),@P2 bigint,@P3 
> bigint,@P4 nvarchar(4000))UPDATE hostcomponentstate SET version = @P0 WHERE 
> ((((component_name = @P1) AND (host_id = @P2)) AND (cluster_id = @P3)) AND 
> (service_name = @P4))</inputbuf>
>       </process>
>       <process id="process10dfa1c28" taskpriority="0" logused="520" 
> waitresource="KEY: 5:72057594165723136 (930b38dc45e3)" waittime="4384" 
> ownerId="16290959" transactionname="implicit_transaction" 
> lasttranstarted="2015-07-22T00:18:01.537" XDES="0x1199b43b0" lockMode="U" 
> schedulerid="1" kpid="3364" status="suspended" spid="61" sbid="0" ecid="0" 
> priority="0" trancount="2" lastbatchstarted="2015-07-22T00:18:01.553" 
> lastbatchcompleted="2015-07-22T00:18:01.553" 
> lastattention="1900-01-01T00:00:00.553" clientapp="Microsoft JDBC Driver for 
> SQL Server" hostname="headnode0" hostpid="0" 
> loginname="[email protected]" isolationlevel="read 
> committed (2)" xactid="16290959" currentdb="5" lockTimeout="4294967295" 
> clientoption1="671088672" clientoption2="128058">
>         <executionStack>
>           <frame procname="adhoc" line="1" stmtstart="160" stmtend="462" 
> sqlhandle="0x02000000e94be81b7f4521ee46bc4614740d406162aad19d0000000000000000000000000000000000000000">unknown</frame>
>           <frame procname="unknown" line="1" 
> sqlhandle="0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000">unknown</frame>
>         </executionStack>
>         <inputbuf>(@P0 nvarchar(4000),@P1 nvarchar(4000),@P2 bigint,@P3 
> bigint,@P4 nvarchar(4000))UPDATE hostcomponentstate SET current_state = @P0 
> WHERE ((((component_name = @P1) AND (host_id = @P2)) AND (cluster_id = @P3)) 
> AND (service_name = @P4))</inputbuf>
>       </process>
>     </process-list>
>     <resource-list>
>       <keylock hobtid="72057594165723136" dbid="5" 
> objectname="ambari.dbo.hostcomponentstate" 
> indexname="PK__hostcomp__C72E1492ED078925" id="lock113b1de80" mode="X" 
> associatedObjectId="72057594165723136">
>         <owner-list>
>           <owner id="process10dfa1c28" mode="X" />
>         </owner-list>
>         <waiter-list>
>           <waiter id="process10faf6ca8" mode="U" requestType="wait" />
>         </waiter-list>
>       </keylock>
>       <keylock hobtid="72057594165723136" dbid="5" 
> objectname="ambari.dbo.hostcomponentstate" 
> indexname="PK__hostcomp__C72E1492ED078925" id="lock110cf0b00" mode="X" 
> associatedObjectId="72057594165723136">
>         <owner-list>
>           <owner id="process10faf6ca8" mode="X" />
>         </owner-list>
>         <waiter-list>
>           <waiter id="process10dfa1c28" mode="U" requestType="wait" />
>         </waiter-list>
>       </keylock>
>     </resource-list>
>   </deadlock>
> </deadlock-list>
> {code}
> Both processes have an X lock on clustered index 
> {{PK__hostcomp__C72E1492ED078925}} and then both try to obtain a U lock; this 
> is because they are doing a Clustered Index Update and are requesting an 
> escalation to a table lock. This is where the deadlock occurs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to