I'm not an expert on the table compression code, but it does need an
overhaul (see https://issues.apache.org/jira/browse/DERBY-3683).
If I'm interpreting the error correctly, the table compressor is blocked
trying to escalate its lock on the SYSCONGLOMERATES row from shared to
exclusive. But it can't do this because the INSERT transaction holds a
shared lock on the SYSCONGLOMERATES row. I don't understand why the
INSERT transaction would need to lock a SYSCONGLOMERATES row. The
following simple experiment shows that, by itself, a simple INSERT
should not lock any rows in SYSCONGLOMERATES:
ij version 10.17
ij> CONNECT 'jdbc:derby:memory:db;create=true';
ij> AUTOCOMMIT OFF;
ij> CREATE TABLE t(a INT);
0 rows inserted/updated/deleted
ij> CREATE INDEX t_idx ON t(a);
0 rows inserted/updated/deleted
ij> CREATE TABLE s(a INT);
0 rows inserted/updated/deleted
ij> INSERT INTO s VALUES (1);
1 row inserted/updated/deleted
ij> COMMIT;
ij> INSERT INTO t SELECT * FROM s;
1 row inserted/updated/deleted
ij> SELECT * FROM SYSCS_DIAG.LOCK_TABLE;
XID |TYPE |MODE|TABLENAME
|LOCKNAME
|STATE|TABLETYPE|LOCK&|INDEXNAME
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
186 |TABLE|IX |T
|Tablelock |GRANT|T |2 |NULL
186 |ROW |X |T
|(1,7) |GRANT|T |1 |NULL
2 rows selected
ij> COMMIT;
ij> SELECT * FROM SYSCS_DIAG.LOCK_TABLE;
XID |TYPE |MODE|TABLENAME
|LOCKNAME
|STATE|TABLETYPE|LOCK&|INDEXNAME
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0 rows selected
Is there anything else going on in the INSERT transaction? Can you share
the table DDL with us, including indexes, constraints, auto-generated
columns, etc.?
-Rick
On 9/28/22 2:12 PM, Nathan Jensen wrote:
Hi, we are using Derby as the backing message store for Qpid-Broker-J. We
have a continuous data flow into Qpid so each message gets persisted to the
Derby message store as data comes in and then it is deleted from the Derby
message store as data is picked up and processed. The data flow is pretty
constant, there are peaks and valleys but basically data is always coming
in so there are LOTS of inserts and deletes. Over time we found that the
Derby message store fills up the disk drive until it runs out of space and
in general the workaround to that has been to stop Qpid and remove the
message store (i.e. the Derby directory), which loses data.
My attempted solution to this was to extend Qpid's DerbyMessageStore with a
DerbyRepackingMessageStore that executes the
SYSCS_UTIL.SYSCS_COMPRESS_TABLE procedure based on a timer. However, on a
test system with flowing data, this always has a deadlock exception and
some data (the aborted transaction) is lost. An example is:
Caused by: org.apache.derby.shared.common.error.StandardException: A lock
could not be obtained due to a deadlock, cycle of locks and waiters is:
Lock : TABLE, QPID_MESSAGE_METADATA, Tablelock
Waiting XID : {501214885, IX} , APP, INSERT INTO QPID_MESSAGE_METADATA(
message_id , meta_data ) values (?, ?)
Granted XID : {501214829, X}
Lock : ROW, SYSCONGLOMERATES, (7,8)
Waiting XID : {501214829, X} , APP, alter table
"APP"."QPID_MESSAGE_METADATA" compress
Granted XID : {501214829, S} , {501214865, S} , {501214885, S}
. The selected victim is XID : 501214885.
at
org.apache.derby.shared.common.error.StandardException.newException(StandardException.java:300)
at
org.apache.derby.shared.common.error.StandardException.newException(StandardException.java:295)
at
org.apache.derby.impl.services.locks.Deadlock.buildException(Deadlock.java:557)
at
org.apache.derby.impl.services.locks.ConcurrentLockSet.lockObject(ConcurrentLockSet.java:647)
My understanding of that is that the deadlock is occurring because the
insert has a shared lock on the sysconglomerates row and wants an exclusive
lock on the qpid_message_metadata table, while the compress has an
exclusive lock on the qpid_message_metadata table and wants an exclusive
lock on the sysconglomerates row, so they block each other.
I found this thread from May of 2009 which had a similar issue:
https://lists.apache.org/thread/b8p7kzcc0clo45o2drs5rttsx5gqq49f I would
expect the compress command to go for the lock(s), wait until it gets the
lock(s), then block the other threads and do the compress. But it
deadlocks instead.
Based on this ticket,https://issues.apache.org/jira/browse/DERBY-3683, I
tried using a non-autocommit connection and locking the table exclusively
before calling compress and then commit, but that got the same errors.
Based on this thread,
https://lists.apache.org/thread/38qg7nkqolssc1jhdkhgnrl7vbkwq0vw, I tried
the in place compress and it succeeds without deadlock. However, it is
significantly slower, too slow to be a workable solution for my use case
and it also doesn't return as much space to the operating system.
Does this look like a Derby bug where the compress command will deadlock if
there are inserts happening at roughly the same time? Or is this
expected? Do both the inserts and compress need a sysconglomerates row
lock? Is there a way the compress command could get both its locks at the
same time to avoid the deadlock?