Ok, below is the log output for the first two compresses. Running select right before the first compress showed no locks so I included the output for the second compress as that was more interesting, though the transaction ids don't match. I removed a bunch of normal qpid data flow logging stuff to shorten the output for clarity, and I formatted the locks printout to hopefully be more readable.
------------------------------------- 2022-09-30T20:48:32,186Z INFO [RepackTimer] (o.a.q.s.d.DerbyRepackingMessageStore) - connection.getAutoCommit() is false 2022-09-30T20:48:32,186Z INFO [RepackTimer] (o.a.q.s.d.DerbyRepackingMessageStore) - Running select from lock table 2022-09-30T20:48:32,259Z INFO [RepackTimer] (o.a.q.s.d.DerbyRepackingMessageStore) - Locks are: 2022-09-30T20:48:32,260Z INFO [RepackTimer] (o.a.q.s.d.DerbyRepackingMessageStore) - Compressing qpid_queue_entries 2022-09-30T20:48:52,554Z ERROR [IO-/147.18.139.60:45964] (o.a.q.s.d.DerbyRepackingMessageStore) - Failed to enqueue messages java.sql.SQLTransactionRollbackException: A lock could not be obtained due to a deadlock, cycle of locks and waiters is: Lock : TABLE, QPID_QUEUE_ENTRIES, Tablelock Waiting XID : {627716261, IX} , APP, INSERT INTO QPID_QUEUE_ENTRIES (queue_id, message_id) values (?,?) Granted XID : {627715588, X} Lock : ROW, SYSCONGLOMERATES, (7,6) Waiting XID : {627715588, X} , APP, alter table "APP"."QPID_QUEUE_ENTRIES" compress Granted XID : {627715588, S} , {627716262, S} , {627716261, S} . The selected victim is XID : 627716261. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:100) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Util.java:230) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(TransactionResourceImpl.java:431) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(TransactionResourceImpl.java:360) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(EmbedConnection.java:2405) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(ConnectionChild.java:88) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.<init>(EmbedPreparedStatement.java:152) at org.apache.derby.impl.jdbc.EmbedPreparedStatement42.<init>(EmbedPreparedStatement42.java:41) at org.apache.derby.iapi.jdbc.Driver42.newEmbedPreparedStatement(Driver42.java:56) at org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(EmbedConnection.java:1740) at org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(EmbedConnection.java:1568) at org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore.enqueueMessages(AbstractJDBCMessageStore.java:684) at org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore.access$700(AbstractJDBCMessageStore.java:81) at org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore$JDBCTransaction.lambda$new$0(AbstractJDBCMessageStore.java:1226) at org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore$JDBCTransaction.doPreCommitActions(AbstractJDBCMessageStore.java:1289) at org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore$JDBCTransaction.commitTranAsync(AbstractJDBCMessageStore.java:1278) at org.apache.qpid.server.txn.AsyncAutoCommitTransaction.enqueue(AsyncAutoCommitTransaction.java:294) at org.apache.qpid.server.message.RoutingResult.send(RoutingResult.java:124) at org.apache.qpid.server.protocol.v1_0.NodeReceivingDestination.send(NodeReceivingDestination.java:117) 2022-09-30T20:49:12,632Z INFO [RepackTimer] (o.a.q.s.d.DerbyRepackingMessageStore) - Running select from lock table 2022-09-30T20:49:12,644Z INFO [RepackTimer] (o.a.q.s.d.DerbyRepackingMessageStore) - Locks are: 627716294 TABLE IX QPID_MESSAGE_CONTENT Tablelock GRANT T 2 null 627716294 ROW X QPID_MESSAGE_METADATA (118,8) GRANT T 1 null 627716294 ROW X QPID_QUEUE_ENTRIES (96,32) GRANT T 1 null 627716294 TABLE IX QPID_QUEUE_ENTRIES Tablelock GRANT T 2 null 627716285 TABLE IX QPID_MESSAGE_METADATA Tablelock GRANT T 2 null 627716294 TABLE IX QPID_MESSAGE_METADATA Tablelock GRANT T 2 null 627716285 ROW X QPID_MESSAGE_METADATA (150,13) GRANT T 2 null 627716294 ROW X QPID_MESSAGE_CONTENT (81,85) GRANT T 1 null 2022-09-30T20:49:12,644Z INFO [RepackTimer] (o.a.q.s.d.DerbyRepackingMessageStore) - Compressing qpid_message_metadata 2022-09-30T20:49:32,682Z ERROR [IO-/147.18.139.173:44710] (o.a.q.s.u.ServerScopedRuntimeException) - Exception on enqueuing message into message store25904045 java.sql.SQLTransactionRollbackException: A lock could not be obtained due to a deadlock, cycle of locks and waiters is: Lock : TABLE, QPID_MESSAGE_METADATA, Tablelock Waiting XID : {627716335, IX} , APP, INSERT INTO QPID_MESSAGE_METADATA( message_id , meta_data ) values (?, ?) Granted XID : {627716296, X} Lock : ROW, SYSCONGLOMERATES, (7,8) Waiting XID : {627716296, X} , APP, alter table "APP"."QPID_MESSAGE_METADATA" compress Granted XID : {627716296, S} , {627716335, S} . The selected victim is XID : 627716335. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:100) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Util.java:230) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(TransactionResourceImpl.java:431) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(TransactionResourceImpl.java:360) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(EmbedConnection.java:2405) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(ConnectionChild.java:88) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.<init>(EmbedPreparedStatement.java:152) at org.apache.derby.impl.jdbc.EmbedPreparedStatement42.<init>(EmbedPreparedStatement42.java:41) at org.apache.derby.iapi.jdbc.Driver42.newEmbedPreparedStatement(Driver42.java:56) at org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(EmbedConnection.java:1740) at org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(EmbedConnection.java:1568) at org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore.storeMetaData(AbstractJDBCMessageStore.java:981) at org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore.access$1200(AbstractJDBCMessageStore.java:81) at org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore$StoredJDBCMessage.store(AbstractJDBCMessageStore.java:1635) at org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore$JDBCTransaction.lambda$enqueueMessage$1(AbstractJDBCMessageStore.java:1240) at org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore$JDBCTransaction.doPreCommitActions(AbstractJDBCMessageStore.java:1289) at org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore$JDBCTransaction.commitTranAsync(AbstractJDBCMessageStore.java:1278) at org.apache.qpid.server.txn.AsyncAutoCommitTransaction.enqueue(AsyncAutoCommitTransaction.java:294) at org.apache.qpid.server.message.RoutingResult.send(RoutingResult.java:124) On Fri, Sep 30, 2022 at 2:46 PM Rick Hillegas <rick.hille...@gmail.com> wrote: > I don't have any theories yet about what is going on. Can you try > issuing the following statement just before you compress the table: > > SELECT * FROM SYSCS_DIAG.LOCK_TABLE; > > I don't understand why the INSERT transaction grabs a lock on a > SYSCONGLOMERATES row. > > Thanks, > -Rick > > > On 9/29/22 1:24 PM, Nathan Jensen wrote: > > Thanks for the response! To clarify a few things, we're using Derby > > 10.15.2.0 and Java 11. Our project is sticking with Java 11 for the time > > being so I can't upgrade to Derby 10.16 at this time. The version of > Qpid > > we're using at present is 7.1.12. > > > > I don't see anything suspicious regarding the INSERT statement, but maybe > > you could see something that I don't. I believe the relevant code is > here: > > > https://github.com/apache/qpid-broker-j/blob/7.1.12/broker-plugins/jdbc-store/src/main/java/org/apache/qpid/server/store/jdbc/AbstractJDBCMessageStore.java#L675 > > That does not commit the transaction, the transaction is committed > > elsewhere in the same file, here: > > > https://github.com/apache/qpid-broker-j/blob/7.1.12/broker-plugins/jdbc-store/src/main/java/org/apache/qpid/server/store/jdbc/AbstractJDBCMessageStore.java#L1264 > > I should also note that one time I saw the deadlock happen with a DELETE > > statement instead of an INSERT, so I don't think the unexpected row > locking > > in SYSCONGLOMERATES is limited to INSERT. > > > > Here is the DDL as produced by Derby's dblook: > > -- ---------------------------------------------- > > -- DDL Statements for tables > > -- ---------------------------------------------- > > > > CREATE TABLE "APP"."QPID_QUEUE_ENTRIES" ("QUEUE_ID" VARCHAR(36) NOT NULL, > > "MESSAGE_ID" BIGINT NOT NULL); > > > > CREATE TABLE "APP"."QPID_MESSAGE_CONTENT" ("MESSAGE_ID" BIGINT NOT NULL, > > "CONTENT" BLOB(2147483647)); > > > > CREATE TABLE "APP"."QPID_XID_ACTIONS" ("FORMAT" BIGINT NOT NULL, > > "GLOBAL_ID" VARCHAR (64) FOR BIT DATA NOT NULL, "BRANCH_ID" VARCHAR (64) > > FOR BIT DATA NOT NULL, "ACTION_TYPE" CHAR(1) NOT NULL, "QUEUE_ID" > > VARCHAR(36) NOT NULL, "MESSAGE_ID" BIGINT NOT NULL); > > > > CREATE TABLE "APP"."AMQP_1_0_LINKS" ("LINK_KEY" VARCHAR(44) NOT NULL, > > "REMOTE_CONTAINER_ID" BLOB(2147483647), "LINK_NAME" BLOB(2147483647), > > "LINK_ROLE" INTEGER, "SOURCE" BLOB(2147483647), "TARGET" > BLOB(2147483647)); > > > > CREATE TABLE "APP"."QPID_MESSAGE_METADATA" ("MESSAGE_ID" BIGINT NOT NULL, > > "META_DATA" BLOB(2147483647)); > > > > CREATE TABLE "APP"."QPID_DB_VERSION" ("VERSION" INTEGER NOT NULL); > > > > CREATE TABLE "APP"."QPID_XIDS" ("FORMAT" BIGINT NOT NULL, "GLOBAL_ID" > > VARCHAR (64) FOR BIT DATA NOT NULL, "BRANCH_ID" VARCHAR (64) FOR BIT DATA > > NOT NULL); > > > > CREATE TABLE "APP"."AMQP_1_0_LINKS_VERSION" ("VERSION" VARCHAR(10) NOT > > NULL, "VERSION_TIME" TIMESTAMP); > > > > -- ---------------------------------------------- > > -- DDL Statements for keys > > -- ---------------------------------------------- > > > > -- PRIMARY/UNIQUE > > ALTER TABLE "APP"."AMQP_1_0_LINKS" ADD CONSTRAINT > > "SQL0000000086-b2280141-0183-5b33-aec6-000016c47528" PRIMARY KEY > > ("LINK_KEY"); > > > > ALTER TABLE "APP"."QPID_MESSAGE_METADATA" ADD CONSTRAINT > > "SQL0000000082-e333810d-0183-5b33-aec6-000016c47528" PRIMARY KEY > > ("MESSAGE_ID"); > > > > ALTER TABLE "APP"."QPID_QUEUE_ENTRIES" ADD CONSTRAINT > > "SQL0000000081-61228105-0183-5b33-aec6-000016c47528" PRIMARY KEY > > ("QUEUE_ID", "MESSAGE_ID"); > > > > ALTER TABLE "APP"."QPID_XID_ACTIONS" ADD CONSTRAINT > > "SQL0000000085-69c68125-0183-5b33-aec6-000016c47528" PRIMARY KEY > ("FORMAT", > > "GLOBAL_ID", "BRANCH_ID", "ACTION_TYPE", "QUEUE_ID", "MESSAGE_ID"); > > > > ALTER TABLE "APP"."QPID_XIDS" ADD CONSTRAINT > > "SQL0000000084-e785811d-0183-5b33-aec6-000016c47528" PRIMARY KEY > ("FORMAT", > > "GLOBAL_ID", "BRANCH_ID"); > > > > ALTER TABLE "APP"."QPID_MESSAGE_CONTENT" ADD CONSTRAINT > > "SQL0000000083-65548115-0183-5b33-aec6-000016c47528" PRIMARY KEY > > ("MESSAGE_ID"); > > > > ALTER TABLE "APP"."AMQP_1_0_LINKS_VERSION" ADD CONSTRAINT > > "SQL0000000087-5555c14b-0183-5b33-aec6-000016c47528" PRIMARY KEY > > ("VERSION"); > > > > > > And just in case I'm doing something stupid, here is the current version > of > > my code. conn.getAutoCommit() is false, but I also tried it with an > > autocommit connection and without the commit statements that time. The > > code is inside a try-catch-finally that is inside a TimerTask. > > > > conn = newConnection(); > > LOGGER.info("connection.getAutoCommit() is " + conn.getAutoCommit()); > > LOGGER.info("Compressing qpid_queue_entries"); > > conn.prepareStatement("call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('APP', > > 'QPID_QUEUE_ENTRIES', 0)").executeUpdate(); > > conn.commit(); > > LOGGER.info("Compressing qpid_message_metadata"); > > conn.prepareStatement("call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('APP', > > 'QPID_MESSAGE_METADATA', 0)").executeUpdate(); > > conn.commit(); > > LOGGER.info("Compressing qpid_message_content"); > > conn.prepareStatement("call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('APP', > > 'QPID_MESSAGE_CONTENT', 0)").executeUpdate(); > > conn.commit(); > > > > Thanks for any help you can provide! > > > > On Thu, Sep 29, 2022 at 1:28 PM Rick Hillegas <rick.hille...@gmail.com> > > wrote: > > > >> I'm not an expert on the table compression code, but it does need an > >> overhaul (see https://issues.apache.org/jira/browse/DERBY-3683). > >> > >> If I'm interpreting the error correctly, the table compressor is blocked > >> trying to escalate its lock on the SYSCONGLOMERATES row from shared to > >> exclusive. But it can't do this because the INSERT transaction holds a > >> shared lock on the SYSCONGLOMERATES row. I don't understand why the > INSERT > >> transaction would need to lock a SYSCONGLOMERATES row. The following > simple > >> experiment shows that, by itself, a simple INSERT should not lock any > rows > >> in SYSCONGLOMERATES: > >> > >> ij version 10.17 > >> > >> ij> CONNECT 'jdbc:derby:memory:db;create=true'; > >> > >> ij> AUTOCOMMIT OFF; > >> > >> ij> CREATE TABLE t(a INT); > >> > >> 0 rows inserted/updated/deleted > >> > >> ij> CREATE INDEX t_idx ON t(a); > >> > >> 0 rows inserted/updated/deleted > >> > >> ij> CREATE TABLE s(a INT); > >> > >> 0 rows inserted/updated/deleted > >> > >> ij> INSERT INTO s VALUES (1); > >> > >> 1 row inserted/updated/deleted > >> > >> ij> COMMIT; > >> > >> ij> INSERT INTO t SELECT * FROM s; > >> > >> 1 row inserted/updated/deleted > >> > >> ij> SELECT * FROM SYSCS_DIAG.LOCK_TABLE; > >> > >> XID |TYPE |MODE|TABLENAME > > |LOCKNAME |STATE|TABLETYPE|LOCK&|INDEXNAME > >> > >> > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > >> > >> 186 |TABLE|IX |T > > |Tablelock |GRANT|T |2 |NULL > >> > >> 186 |ROW |X |T > > |(1,7) |GRANT|T |1 |NULL > >> > >> 2 rows selected > >> > >> ij> COMMIT; > >> > >> ij> SELECT * FROM SYSCS_DIAG.LOCK_TABLE; > >> > >> XID |TYPE |MODE|TABLENAME > > |LOCKNAME |STATE|TABLETYPE|LOCK&|INDEXNAME > >> > >> > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > >> > >> 0 rows selected > >> > >> > >> Is there anything else going on in the INSERT transaction? Can you share > >> the table DDL with us, including indexes, constraints, auto-generated > >> columns, etc.? > >> > >> -Rick > >> > >> > >> On 9/28/22 2:12 PM, Nathan Jensen wrote: > >> > >> Hi, we are using Derby as the backing message store for Qpid-Broker-J. > We > >> have a continuous data flow into Qpid so each message gets persisted to > the > >> Derby message store as data comes in and then it is deleted from the > Derby > >> message store as data is picked up and processed. The data flow is > pretty > >> constant, there are peaks and valleys but basically data is always > coming > >> in so there are LOTS of inserts and deletes. Over time we found that > the > >> Derby message store fills up the disk drive until it runs out of space > and > >> in general the workaround to that has been to stop Qpid and remove the > >> message store (i.e. the Derby directory), which loses data. > >> > >> My attempted solution to this was to extend Qpid's DerbyMessageStore > with a > >> DerbyRepackingMessageStore that executes the > >> SYSCS_UTIL.SYSCS_COMPRESS_TABLE procedure based on a timer. However, > on a > >> test system with flowing data, this always has a deadlock exception and > >> some data (the aborted transaction) is lost. An example is: > >> > >> Caused by: org.apache.derby.shared.common.error.StandardException: A > lock > >> could not be obtained due to a deadlock, cycle of locks and waiters is: > >> Lock : TABLE, QPID_MESSAGE_METADATA, Tablelock > >> Waiting XID : {501214885, IX} , APP, INSERT INTO > QPID_MESSAGE_METADATA( > >> message_id , meta_data ) values (?, ?) > >> Granted XID : {501214829, X} > >> Lock : ROW, SYSCONGLOMERATES, (7,8) > >> Waiting XID : {501214829, X} , APP, alter table > >> "APP"."QPID_MESSAGE_METADATA" compress > >> Granted XID : {501214829, S} , {501214865, S} , {501214885, S} > >> . The selected victim is XID : 501214885. > >> at > >> > org.apache.derby.shared.common.error.StandardException.newException(StandardException.java:300) > >> at > >> > org.apache.derby.shared.common.error.StandardException.newException(StandardException.java:295) > >> at > >> > org.apache.derby.impl.services.locks.Deadlock.buildException(Deadlock.java:557) > >> at > >> > org.apache.derby.impl.services.locks.ConcurrentLockSet.lockObject(ConcurrentLockSet.java:647) > >> > >> My understanding of that is that the deadlock is occurring because the > >> insert has a shared lock on the sysconglomerates row and wants an > exclusive > >> lock on the qpid_message_metadata table, while the compress has an > >> exclusive lock on the qpid_message_metadata table and wants an exclusive > >> lock on the sysconglomerates row, so they block each other. > >> > >> I found this thread from May of 2009 which had a similar issue: > https://lists.apache.org/thread/b8p7kzcc0clo45o2drs5rttsx5gqq49f I would > >> expect the compress command to go for the lock(s), wait until it gets > the > >> lock(s), then block the other threads and do the compress. But it > >> deadlocks instead. > >> > >> Based on this ticket, https://issues.apache.org/jira/browse/DERBY-3683, > I > >> tried using a non-autocommit connection and locking the table > exclusively > >> before calling compress and then commit, but that got the same errors. > >> Based on this thread, > https://lists.apache.org/thread/38qg7nkqolssc1jhdkhgnrl7vbkwq0vw, I tried > >> the in place compress and it succeeds without deadlock. However, it is > >> significantly slower, too slow to be a workable solution for my use case > >> and it also doesn't return as much space to the operating system. > >> > >> Does this look like a Derby bug where the compress command will > deadlock if > >> there are inserts happening at roughly the same time? Or is this > >> expected? Do both the inserts and compress need a sysconglomerates row > >> lock? Is there a way the compress command could get both its locks at > the > >> same time to avoid the deadlock? > >> > >> > >> > >> > >