Re: deadlock question

Mike Matrigali Wed, 31 Jan 2007 15:28:24 -0800


Randy Letness wrote:

Mike Matrigali wrote:

Posting complete information from derby.log may help.  Just post all
of it from the most recent boot message to the end of the error.


I could but I didn't want to spam the list with a 100K of text.

That's fine. As was suggested by someone else - go ahead and file aJIRA. That gives a great place to upload a 100k file so someone else

can look at it.

My best guess with the limited information so far is that you are
encountering an index split deadlock.  So in the case below transaction
6154 is the writer, 6159 is the reader, and 6160 is the internal
transaction trying to do an index split on behalf of 6154.  (10, 1)
row lock is a special internal lock for derby, which allows better
performance for scanning, but reduced concurrency for splitting.  It
basically protects the scan "position" on a page while the scan does
not have a latch on the page.  Not sure exactly but from your program
description 6154 has row locks on every row in the table and is likely
in the insert phase and needs to split the btree index, forks a child
transaction 6160 to do the split which is waiting on 6159 the reader.
The reader is blocked on row (2,110) which is one of the many row
locks held by the writer.

Ah this makes sense. I didn't take into account internal transactions.Index split deadlock huh? I would have never thought of that. Doesthis happen often?

I almost never see it, but of course every app is different. Theproblem is that your test case, I think

guarantees it happens since you lock every row in the delete and the
scanner is stuck on waiting on one of the rows.
What it takes is the writer adding rows to the same leaf index page as
the scanner is blocked on, and needing to split the page that the
scanner is blocked on.

In the 3 million row app is it likely the scanner looks at same rows
as writer?

Altering the page size of the index may give you a partial workaround.
Bigger index pages would mean less total splits, thus less chance for
this deadlock case.  But that is not guaranteed.

As a workaround, since your application knows it is going to delete
all the rows in the table it may work to just set table level locking
for the update transaction statement doing the delete of all rows.
The problem is that in my test program it turns out to be all the rowsin the table, but in a the environment the problem was discovered itwill only be 300 of 3 million rows, so locking the table will reduceconcurrency.

ok, workaround makes no sense here.  In the real environment is the
scanner likely to be looking at the same rows that you are deleting?

The best way to track down what is going on with the thread that you
don't think should be active is to reproduce the problem with some
extra information being logged.  This way it should be obvious if
the extra transaction has committed or not, and if the extra transaction
is an internal transaction.  The derby.log may also have stack traces
which would give me a better idea what is going on.:
set derby.locks.monitor=true and derby.locks.deadlockTrace=true inderby.properties. This will print the lock table information to thederby.log if timeout happens. If you also setderby.language.logStatementText=true, then all the statements executedon the system will be written out to the derby.log. Statementinformation also includes the transaction id on which the statement isbeing executed. You can use the transaction id holding the lock in thelock table dump to do a reverse search for the transaction id inderby.log, making it possible to narrow down which statements may beholding the locks.
I have derby.locks.monitor=true and derby.locks.deadlockTrace=true andget the lock table dump. I'll try settingderby.language.logStatementText=true and see what extra info I can get.Thanks for all the help.
-Randy
Randy Letness wrote:
Mike Matrigali wrote:
also just posting the deadlock message with the lock informationabout the locks in the deadlock cycle will answer a lot of questions.
It may depend if there are indexes, it may depend on how many rows
are in the table and if escalation happened.  Also make sure to
include what version of the software you are running.
I'm running Derby 10.2.1.6.

The message is:
2007-01-31 16:57:53.828 GMT Thread[reader0,5,main] (XID = 6159),(SESSIONID = 231), (DATABASE = target/test-db), (DRDAID = null),Failed Statement is: select distinct i.* from item i,cal_timerange_index ctri1 where i.parentid=? and i.isactive=1 andctri1.itemid=i.id and (ctri1.componenttype = ? and ((ctri1.startdate< case when ctri1.isfloating=1 then '20080218T115000' else ? end andctri1.enddate > case when ctri1.isfloating=1 then '20080117T115000'else ? end) or (ctri1.startdate >= case when ctri1.isfloating=1 then'20080117T115000' else ? end and ctri1.startdate < case whenctri1.isfloating=1 then '20080218T115000' else ? end)))
So the select statement is a join of two tables. The statementreturns about 30 rows. There are roughly 300 rows incal_timerange_index. The update transaction removes all these rowsand adds the same amount, while the read transaction selects 30 ofthe rows that are being deleted by the update transaction.
The lock dump shows:

*** The following row is the victim ***
6159 |ROW |S |0 |(2,110) |WAIT |T |NULL|CAL_TIMERANGE_INDEX |
Here are the locks involved:
6154 |ROW |X |1 |(2,110)|GRANT|T|NULL|CAL_TIMERANGE_INDEX |6159 |ROW |S |1|(10,1)|GRANT|T|SQL070131105108180|CAL_TIMERANGE_INDEX |6160 |ROW |X |0 |(10,1)|WAIT|T|SQL070131105108180 |CAL_TIMERANGE_INDEX |
There are also these table locks:
6154 |TABLE |IX |315 |Tablelock |GRANT|T|NULL|CAL_TIMERANGE_INDEX |6159 |TABLE |IS |2 |Tablelock |GRANT|T |NULL|CAL_TIMERANGE_INDEX |6160 |TABLE |IX |1 |Tablelock |GRANT|T |NULL|CAL_TIMERANGE_INDEX
A couple things to note. There are 3 transactions, yet only 2 shouldbe active at a time (1 writer thread, 1 reader thread). Also none ofthese transactions is in a traditional deadlock (where there arewaiting on each other). Instead, the deadlock occurs because tx6154doesn't release its locks. 6154 isn't waiting on anything. Itsalmost as if the locks aren't getting released. I think i'm going totry to work on a simpler example to reproduce this.
-Randy

Re: deadlock question

Reply via email to