Thanks for the response! To clarify a few things, we're using Derby 10.15.2.0 and Java 11. Our project is sticking with Java 11 for the time being so I can't upgrade to Derby 10.16 at this time. The version of Qpid we're using at present is 7.1.12.
I don't see anything suspicious regarding the INSERT statement, but maybe you could see something that I don't. I believe the relevant code is here: https://github.com/apache/qpid-broker-j/blob/7.1.12/broker-plugins/jdbc-store/src/main/java/org/apache/qpid/server/store/jdbc/AbstractJDBCMessageStore.java#L675 That does not commit the transaction, the transaction is committed elsewhere in the same file, here: https://github.com/apache/qpid-broker-j/blob/7.1.12/broker-plugins/jdbc-store/src/main/java/org/apache/qpid/server/store/jdbc/AbstractJDBCMessageStore.java#L1264 I should also note that one time I saw the deadlock happen with a DELETE statement instead of an INSERT, so I don't think the unexpected row locking in SYSCONGLOMERATES is limited to INSERT. Here is the DDL as produced by Derby's dblook: -- ---------------------------------------------- -- DDL Statements for tables -- ---------------------------------------------- CREATE TABLE "APP"."QPID_QUEUE_ENTRIES" ("QUEUE_ID" VARCHAR(36) NOT NULL, "MESSAGE_ID" BIGINT NOT NULL); CREATE TABLE "APP"."QPID_MESSAGE_CONTENT" ("MESSAGE_ID" BIGINT NOT NULL, "CONTENT" BLOB(2147483647)); CREATE TABLE "APP"."QPID_XID_ACTIONS" ("FORMAT" BIGINT NOT NULL, "GLOBAL_ID" VARCHAR (64) FOR BIT DATA NOT NULL, "BRANCH_ID" VARCHAR (64) FOR BIT DATA NOT NULL, "ACTION_TYPE" CHAR(1) NOT NULL, "QUEUE_ID" VARCHAR(36) NOT NULL, "MESSAGE_ID" BIGINT NOT NULL); CREATE TABLE "APP"."AMQP_1_0_LINKS" ("LINK_KEY" VARCHAR(44) NOT NULL, "REMOTE_CONTAINER_ID" BLOB(2147483647), "LINK_NAME" BLOB(2147483647), "LINK_ROLE" INTEGER, "SOURCE" BLOB(2147483647), "TARGET" BLOB(2147483647)); CREATE TABLE "APP"."QPID_MESSAGE_METADATA" ("MESSAGE_ID" BIGINT NOT NULL, "META_DATA" BLOB(2147483647)); CREATE TABLE "APP"."QPID_DB_VERSION" ("VERSION" INTEGER NOT NULL); CREATE TABLE "APP"."QPID_XIDS" ("FORMAT" BIGINT NOT NULL, "GLOBAL_ID" VARCHAR (64) FOR BIT DATA NOT NULL, "BRANCH_ID" VARCHAR (64) FOR BIT DATA NOT NULL); CREATE TABLE "APP"."AMQP_1_0_LINKS_VERSION" ("VERSION" VARCHAR(10) NOT NULL, "VERSION_TIME" TIMESTAMP); -- ---------------------------------------------- -- DDL Statements for keys -- ---------------------------------------------- -- PRIMARY/UNIQUE ALTER TABLE "APP"."AMQP_1_0_LINKS" ADD CONSTRAINT "SQL0000000086-b2280141-0183-5b33-aec6-000016c47528" PRIMARY KEY ("LINK_KEY"); ALTER TABLE "APP"."QPID_MESSAGE_METADATA" ADD CONSTRAINT "SQL0000000082-e333810d-0183-5b33-aec6-000016c47528" PRIMARY KEY ("MESSAGE_ID"); ALTER TABLE "APP"."QPID_QUEUE_ENTRIES" ADD CONSTRAINT "SQL0000000081-61228105-0183-5b33-aec6-000016c47528" PRIMARY KEY ("QUEUE_ID", "MESSAGE_ID"); ALTER TABLE "APP"."QPID_XID_ACTIONS" ADD CONSTRAINT "SQL0000000085-69c68125-0183-5b33-aec6-000016c47528" PRIMARY KEY ("FORMAT", "GLOBAL_ID", "BRANCH_ID", "ACTION_TYPE", "QUEUE_ID", "MESSAGE_ID"); ALTER TABLE "APP"."QPID_XIDS" ADD CONSTRAINT "SQL0000000084-e785811d-0183-5b33-aec6-000016c47528" PRIMARY KEY ("FORMAT", "GLOBAL_ID", "BRANCH_ID"); ALTER TABLE "APP"."QPID_MESSAGE_CONTENT" ADD CONSTRAINT "SQL0000000083-65548115-0183-5b33-aec6-000016c47528" PRIMARY KEY ("MESSAGE_ID"); ALTER TABLE "APP"."AMQP_1_0_LINKS_VERSION" ADD CONSTRAINT "SQL0000000087-5555c14b-0183-5b33-aec6-000016c47528" PRIMARY KEY ("VERSION"); And just in case I'm doing something stupid, here is the current version of my code. conn.getAutoCommit() is false, but I also tried it with an autocommit connection and without the commit statements that time. The code is inside a try-catch-finally that is inside a TimerTask. conn = newConnection(); LOGGER.info("connection.getAutoCommit() is " + conn.getAutoCommit()); LOGGER.info("Compressing qpid_queue_entries"); conn.prepareStatement("call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('APP', 'QPID_QUEUE_ENTRIES', 0)").executeUpdate(); conn.commit(); LOGGER.info("Compressing qpid_message_metadata"); conn.prepareStatement("call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('APP', 'QPID_MESSAGE_METADATA', 0)").executeUpdate(); conn.commit(); LOGGER.info("Compressing qpid_message_content"); conn.prepareStatement("call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('APP', 'QPID_MESSAGE_CONTENT', 0)").executeUpdate(); conn.commit(); Thanks for any help you can provide! On Thu, Sep 29, 2022 at 1:28 PM Rick Hillegas <rick.hille...@gmail.com> wrote: > I'm not an expert on the table compression code, but it does need an > overhaul (see https://issues.apache.org/jira/browse/DERBY-3683). > > If I'm interpreting the error correctly, the table compressor is blocked > trying to escalate its lock on the SYSCONGLOMERATES row from shared to > exclusive. But it can't do this because the INSERT transaction holds a > shared lock on the SYSCONGLOMERATES row. I don't understand why the INSERT > transaction would need to lock a SYSCONGLOMERATES row. The following simple > experiment shows that, by itself, a simple INSERT should not lock any rows > in SYSCONGLOMERATES: > > ij version 10.17 > > ij> CONNECT 'jdbc:derby:memory:db;create=true'; > > ij> AUTOCOMMIT OFF; > > ij> CREATE TABLE t(a INT); > > 0 rows inserted/updated/deleted > > ij> CREATE INDEX t_idx ON t(a); > > 0 rows inserted/updated/deleted > > ij> CREATE TABLE s(a INT); > > 0 rows inserted/updated/deleted > > ij> INSERT INTO s VALUES (1); > > 1 row inserted/updated/deleted > > ij> COMMIT; > > ij> INSERT INTO t SELECT * FROM s; > > 1 row inserted/updated/deleted > > ij> SELECT * FROM SYSCS_DIAG.LOCK_TABLE; > > XID |TYPE |MODE|TABLENAME > > |LOCKNAME |STATE|TABLETYPE|LOCK&|INDEXNAME > > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > 186 |TABLE|IX |T > > |Tablelock |GRANT|T |2 |NULL > > 186 |ROW |X |T > > |(1,7) |GRANT|T |1 |NULL > > 2 rows selected > > ij> COMMIT; > > ij> SELECT * FROM SYSCS_DIAG.LOCK_TABLE; > > XID |TYPE |MODE|TABLENAME > > |LOCKNAME |STATE|TABLETYPE|LOCK&|INDEXNAME > > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > 0 rows selected > > > Is there anything else going on in the INSERT transaction? Can you share > the table DDL with us, including indexes, constraints, auto-generated > columns, etc.? > > -Rick > > > On 9/28/22 2:12 PM, Nathan Jensen wrote: > > Hi, we are using Derby as the backing message store for Qpid-Broker-J. We > have a continuous data flow into Qpid so each message gets persisted to the > Derby message store as data comes in and then it is deleted from the Derby > message store as data is picked up and processed. The data flow is pretty > constant, there are peaks and valleys but basically data is always coming > in so there are LOTS of inserts and deletes. Over time we found that the > Derby message store fills up the disk drive until it runs out of space and > in general the workaround to that has been to stop Qpid and remove the > message store (i.e. the Derby directory), which loses data. > > My attempted solution to this was to extend Qpid's DerbyMessageStore with a > DerbyRepackingMessageStore that executes the > SYSCS_UTIL.SYSCS_COMPRESS_TABLE procedure based on a timer. However, on a > test system with flowing data, this always has a deadlock exception and > some data (the aborted transaction) is lost. An example is: > > Caused by: org.apache.derby.shared.common.error.StandardException: A lock > could not be obtained due to a deadlock, cycle of locks and waiters is: > Lock : TABLE, QPID_MESSAGE_METADATA, Tablelock > Waiting XID : {501214885, IX} , APP, INSERT INTO QPID_MESSAGE_METADATA( > message_id , meta_data ) values (?, ?) > Granted XID : {501214829, X} > Lock : ROW, SYSCONGLOMERATES, (7,8) > Waiting XID : {501214829, X} , APP, alter table > "APP"."QPID_MESSAGE_METADATA" compress > Granted XID : {501214829, S} , {501214865, S} , {501214885, S} > . The selected victim is XID : 501214885. > at > org.apache.derby.shared.common.error.StandardException.newException(StandardException.java:300) > at > org.apache.derby.shared.common.error.StandardException.newException(StandardException.java:295) > at > org.apache.derby.impl.services.locks.Deadlock.buildException(Deadlock.java:557) > at > org.apache.derby.impl.services.locks.ConcurrentLockSet.lockObject(ConcurrentLockSet.java:647) > > My understanding of that is that the deadlock is occurring because the > insert has a shared lock on the sysconglomerates row and wants an exclusive > lock on the qpid_message_metadata table, while the compress has an > exclusive lock on the qpid_message_metadata table and wants an exclusive > lock on the sysconglomerates row, so they block each other. > > I found this thread from May of 2009 which had a similar > issue:https://lists.apache.org/thread/b8p7kzcc0clo45o2drs5rttsx5gqq49f I > would > expect the compress command to go for the lock(s), wait until it gets the > lock(s), then block the other threads and do the compress. But it > deadlocks instead. > > Based on this ticket, https://issues.apache.org/jira/browse/DERBY-3683, I > tried using a non-autocommit connection and locking the table exclusively > before calling compress and then commit, but that got the same errors. > Based on this > thread,https://lists.apache.org/thread/38qg7nkqolssc1jhdkhgnrl7vbkwq0vw, I > tried > the in place compress and it succeeds without deadlock. However, it is > significantly slower, too slow to be a workable solution for my use case > and it also doesn't return as much space to the operating system. > > Does this look like a Derby bug where the compress command will deadlock if > there are inserts happening at roughly the same time? Or is this > expected? Do both the inserts and compress need a sysconglomerates row > lock? Is there a way the compress command could get both its locks at the > same time to avoid the deadlock? > > > >