Hello everyone, I just solved a hard-to-troubleshoot deadlock bug in my application. The fact that a deadlock occurred may or may not be a flaw in Derby, but the failure of the deadlock detection code to detect the cycle is probably a bug. I'm wondering if a knowledgeable person has time to read my analysis and advise if I should file a defect or not.
Problem summary: Even using READ_COMMITTED, a single non-updatable SELECT and a single UPDATE statement can deadlock against each other when an index includes the updated column. The transactions fail due to a lock timeout and the deadlock is not detected, possibly because a transaction of type InternalTransaction is part of the cycle. More details: My test case uses the following table and index: CREATE TABLE urls (urlid INTEGER NOT NULL PRIMARY KEY, url VARCHAR(2048) NOT NULL UNIQUE, site INTEGER, expectation INTEGER, jobflag CHAR DEFAULT 'N'); CREATE INDEX findurlbysiteandjob ON urls(site,jobflag); My test case creates two threads and executes the following statements until they deadlock against each other: UPDATE urls SET jobflag=? WHERE urlid=? SELECT urlid,url,expectation FROM urls WHERE site=? The test eventually deadlocks with the following transaction and lock table contents: XID TYPE MODE TABLENAME LOCKNAME STATE TABLETYPE LOCKCOUNT INDEXNAME 2217109 ROW S URLS (13,1) GRANT T 1 FINDURLBYSITEANDJOB 2217114 ROW X URLS (13,1) WAIT T 0 FINDURLBYSITEANDJOB 2217113 ROW S URLS (15,1) GRANT T 1 FINDURLBYSITEANDJOB 2217113 ROW X URLS (3,132) GRANT T 3 null 2217109 ROW S URLS (3,132) WAIT T 0 null 2217109 TABLE IS URLS Tablelock GRANT T 2 null 2217113 TABLE IX URLS Tablelock GRANT T 4 null 2217114 TABLE IX URLS Tablelock GRANT T 1 null 2217113 ROW S URLS (6,1) GRANT T 1 SQL081111021116970 XID GLOBAL_XID USERNAME TYPE STATUS FIRST_INSTANT SQL_TEXT 2217115 null APP UserTransaction IDLE null select * from SYSCS_DIAG.TRANSACTION_TABLE 2217114 null APP InternalTransaction ACTIVE null UPDATE urls SET jobflag=? WHERE urlid=? 2217113 null APP UserTransaction ACTIVE (526,52925) UPDATE urls SET jobflag=? WHERE urlid=? 2069160 null null SystemTransaction IDLE null null 2217109 null APP UserTransaction ACTIVE null SELECT urlid,url,expectation FROM urls WHERE site=? Here is what I think is happening: 1. The SELECT statement begins to execute and the cursor is stepping through the result set. The results are derived from index FINDURLBYSITEANDJOB as expected. 2. The UPDATE statement begins to execute. The row to be updated is the row immediately after the SELECT statement's cursor. The row is locked and updated. 3. The UPDATE statement must perform index maintenance (tree rebalancing or similar?). This apparently causes an InternalTransaction to be created. It then must lock the row that the SELECT statement's cursor is currently occupying. It cannot do this, so the transaction waits. 4. The SELECT statement is ready to advance the cursor. However, it cannot advance the cursor because the UPDATE statement has locked the next row. The transaction waits. The result: Transaction 2217113 waits for the "nested transaction" 2217114 to complete. 2217114 waits for 2217109 to release its lock. 2217109 waits for 2217113 to release its lock. We have a cycle and a deadlock. Even worse, the transactions time out with "A lock could not be obtained within the time requested", apparently because the dependency between transactions 2217113 and 2217114 is not detected. Why I think this is a problem: 1. One would not expect these two statements to deadlock against each other. Intuitively, the UPDATE should wait for the cursor to go away before locking rows around it. Interestingly, this is a problem because the UPDATE locks rows in the opposite order that the SELECT locks them, which greatly increases the probability of a deadlock. 2. Apparently, the only way to avoid this deadlock (other than resorting to READ UNCOMMITTED or other uglyness) is to LOCK TABLE before updating. Because this deadlock scenario is so common (it could happen whenever an index is simultaneously SELECTing and UPDATEing), this could be disastrous for performance. 3. The failure to detect the deadlock obscures the problem and makes troubleshooting more difficult, especially since it's not intuitive that this would cause a deadlock in the first place. Suggested improvements (I am not a Derby developer so feel free to give critical comments): 1. Properly consider the dependency between a UserTransaction and an InternalTransaction when detecting transaction/lock cycles. 2. Row locking could be more pessimistic when updating indicies. Instead of locking rows/nodes in the index one by one, could one lock cover all rows/nodes that must be modified? 3. When locking in an index for an update, could rows/nodes be locked according to the "sort order" of the index, instead of being locked in the reverse order? 4. At the READ COMMITTED isolation level, could the cursor for a SELECT statement release its lock on the current row before acquiring a lock on the next row? (this would probably mess up other things.) Developers, do you advise that I file defects requesting any of these suggested improvements? Thanks for reading this -- Derby has worked very well for me, and I just want to help the development team make it even better.
