Posting complete information from derby.log may help.  Just post all
of it from the most recent boot message to the end of the error.

My best guess with the limited information so far is that you are
encountering an index split deadlock.  So in the case below transaction
6154 is the writer, 6159 is the reader, and 6160 is the internal
transaction trying to do an index split on behalf of 6154.  (10, 1)
row lock is a special internal lock for derby, which allows better
performance for scanning, but reduced concurrency for splitting.  It
basically protects the scan "position" on a page while the scan does
not have a latch on the page.  Not sure exactly but from your program
description 6154 has row locks on every row in the table and is likely
in the insert phase and needs to split the btree index, forks a child
transaction 6160 to do the split which is waiting on 6159 the reader.
The reader is blocked on row (2,110) which is one of the many row
locks held by the writer.

As a workaround, since your application knows it is going to delete
all the rows in the table it may work to just set table level locking
for the update transaction statement doing the delete of all rows.
This may actually perform slightly better than getting individual
row locks on each row and in the future may also allow for better
space management optimizations for the delete.
http://db.apache.org/derby/docs/10.2/ref/rrefsqlj40506.html


The best way to track down what is going on with the thread that you
don't think should be active is to reproduce the problem with some
extra information being logged.  This way it should be obvious if
the extra transaction has committed or not, and if the extra transaction
is an internal transaction.  The derby.log may also have stack traces
which would give me a better idea what is going on.:
set derby.locks.monitor=true and derby.locks.deadlockTrace=true in derby.properties. This will print the lock table information to the derby.log if timeout happens. If you also set derby.language.logStatementText=true, then all the statements executed on the system will be written out to the derby.log. Statement information also includes the transaction id on which the statement is being executed. You can use the transaction id holding the lock in the lock table dump to do a reverse search for the transaction id in derby.log, making it possible to narrow down which statements may be holding the locks.



Randy Letness wrote:
Mike Matrigali wrote:

also just posting the deadlock message with the lock information about the locks in the deadlock cycle will answer a lot of questions.
It may depend if there are indexes, it may depend on how many rows
are in the table and if escalation happened.  Also make sure to
include what version of the software you are running.

I'm running Derby 10.2.1.6.

The message is:

2007-01-31 16:57:53.828 GMT Thread[reader0,5,main] (XID = 6159), (SESSIONID = 231), (DATABASE = target/test-db), (DRDAID = null), Failed Statement is: select distinct i.* from item i, cal_timerange_index ctri1 where i.parentid=? and i.isactive=1 and ctri1.itemid=i.id and (ctri1.componenttype = ? and ((ctri1.startdate < case when ctri1.isfloating=1 then '20080218T115000' else ? end and ctri1.enddate > case when ctri1.isfloating=1 then '20080117T115000' else ? end) or (ctri1.startdate >= case when ctri1.isfloating=1 then '20080117T115000' else ? end and ctri1.startdate < case when ctri1.isfloating=1 then '20080218T115000' else ? end)))

So the select statement is a join of two tables. The statement returns about 30 rows. There are roughly 300 rows in cal_timerange_index. The update transaction removes all these rows and adds the same amount, while the read transaction selects 30 of the rows that are being deleted by the update transaction.

The lock dump shows:

*** The following row is the victim ***
6159 |ROW |S |0 |(2,110) |WAIT |T |NULL |CAL_TIMERANGE_INDEX |

Here are the locks involved:
6154 |ROW |X |1 |(2,110) |GRANT|T|NULL|CAL_TIMERANGE_INDEX | 6159 |ROW |S |1 |(10,1)|GRANT|T|SQL070131105108180|CAL_TIMERANGE_INDEX | 6160 |ROW |X |0 |(10,1)|WAIT |T|SQL070131105108180 |CAL_TIMERANGE_INDEX |

There are also these table locks:
6154 |TABLE |IX |315 |Tablelock |GRANT|T|NULL |CAL_TIMERANGE_INDEX | 6159 |TABLE |IS |2 |Tablelock |GRANT|T |NULL |CAL_TIMERANGE_INDEX | 6160 |TABLE |IX |1 |Tablelock |GRANT|T |NULL |CAL_TIMERANGE_INDEX

A couple things to note. There are 3 transactions, yet only 2 should be active at a time (1 writer thread, 1 reader thread). Also none of these transactions is in a traditional deadlock (where there are waiting on each other). Instead, the deadlock occurs because tx6154 doesn't release its locks. 6154 isn't waiting on anything. Its almost as if the locks aren't getting released. I think i'm going to try to work on a simpler example to reproduce this.

-Randy

Kristian Waagan wrote:

Randy Letness skrev:

A little more info:

I have a test program with a "writer thread", a thread that continuously performs update transactions, and a "reader thread", a thread that does selects on the data that is being updated. So with two threads, there should be at most 2 transactions in progress at any instant. When the deadlock occurs, the lock dump table shows otherwise. There are 3 distinct transaction ids (1,2,3 for simplicity). Looking at the locks, I can tell that tx1 and tx3 are update transactions (they have exclusive locks) and tx2 is the read tx (has shared locks). This doesn't make sense because there is only a single thread doing updates so there should only be a single update transaction active. Tx2 is waiting on a lock held by tx1 and tx3 is waiting on a lock held by tx2. For some reason the locks held by tx1 aren't released, and that is why the deadlock occurs. Its almost as if the locks from the update transaction aren't being fully released after a commit. Tx1 isn't waiting on anything, but the locks are still there. So something screwy is going on. Is there such thing as a phantom lock?

Also, I can't reproduce this on a MySQL database. I'm using the same code, different JDBC driver.


Hello Randy,

I recommend you log a Jira issue with a simple reproduction script/program. Then the community can/may have a look at it and determine if it is a bug or not. Even if it turns out not to be a bug, it will be nice to have it documented in the issue tracking system with an explanation of why Derby behaves this way.

Jira (Derby's issue tracking system) can be accessed here: https://issues.apache.org/jira/browse/DERBY


thanks,






Reply via email to