Re: Review Request 36895: Cluster creates stuck at 9x% (deadlock sql exception)

Jonathan Hurley Wed, 29 Jul 2015 18:12:07 -0700

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36895/
-----------------------------------------------------------


(Updated July 29, 2015, 9:11 p.m.)


Review request for Ambari, Alejandro Fernandez, Nate Cole, and Sumit Mohanty.


Changes
-------

As it turns out, there are 2 problems with the original patch:
- SQL Azure doesn't support the `sp_indexoption` store procedure (but SQL IAAS 
did)
- SQL Azure encountered a page lock on larger cluster deployments (SQL IAAS 
didn't)

So, now we're back to the clustered index. This change moves the PK index from 
CLUSTERED to NONCLUSTERED so that the table doesn't need to be re-organized and 
thus the X locks are not needed.

Now, this will be slightly less performant, but it only affects SQL Server. I 
have another patch where I hadded a surrogate PK long ID column and make that 
CLUSTERED. This other patch also moves the 4 columns to a NONCLUSTERED index. 
The only benefit is that the CLUSTERED PK can be used during the JPA merge 
giving a slight performance gain. However, all databases would lose querying by 
PK, so this wasn't a very desireble option.

This patch is the least impact that resolves the issue.


Bugs: AMBARI-12570
    https://issues.apache.org/jira/browse/AMBARI-12570


Repository: ambari


Description
-------

Similar to AMBARI-12526, Ambari installation via a blueprint on SQL Azure gets 
stuck somewhere between 90% and 100% because of a SQL Database deadlock.

- We have dual X-locks on hostcomponentstate asking for U-locks when updating 
the CLUSTERED INDEX.
- Both dual X-locks, from different transactions and different processes, are 
on the same row (technically impossible) - based on the XML execution plan, we 
can see that the concurrent UPDATE statements are executing on different rows 
due to their CLUSTERED INDEX predicate.
- In Java, Ambari has locks which prevent concurrent U- or X-locks on the same 
row
- Only happens on SQL Server

My best suspicion right now is that we have a key hash collision happening on 
this table. That's why two processes appear to have the same lock even though 
they are on different rows.

Restricting row-level locking on this table will prevent locking on hash keys 
which could collide.


Diffs (updated)
-----

  ambari-server/src/main/resources/Ambari-DDL-SQLServer-CREATE.sql 0ff1aff 

Diff: https://reviews.apache.org/r/36895/diff/


Testing
-------

Deployed a clean cluster on SQL Server and then ran 10+ deployments on SQL 
Azure without seeing a deadlock.


Thanks,

Jonathan Hurley

Re: Review Request 36895: Cluster creates stuck at 9x% (deadlock sql exception)

Reply via email to