Re: deadlock when running SYSCS_UTIL.SYSCS_COMPRESS_TABLE

Nathan Jensen Fri, 30 Sep 2022 14:17:07 -0700

Ok, below is the log output for the first two compresses.  Running select
right before the first compress showed no locks so I included the output
for the second compress as that was more interesting, though the
transaction ids don't match.  I removed a bunch of normal qpid data flow
logging stuff to shorten the output for clarity, and I formatted the locks
printout to hopefully be more readable.


-------------------------------------

2022-09-30T20:48:32,186Z INFO  [RepackTimer]
(o.a.q.s.d.DerbyRepackingMessageStore) - connection.getAutoCommit() is false
2022-09-30T20:48:32,186Z INFO  [RepackTimer]
(o.a.q.s.d.DerbyRepackingMessageStore) - Running select from lock table
2022-09-30T20:48:32,259Z INFO  [RepackTimer]
(o.a.q.s.d.DerbyRepackingMessageStore) - Locks are:

2022-09-30T20:48:32,260Z INFO  [RepackTimer]
(o.a.q.s.d.DerbyRepackingMessageStore) - Compressing qpid_queue_entries
2022-09-30T20:48:52,554Z ERROR [IO-/147.18.139.60:45964]
(o.a.q.s.d.DerbyRepackingMessageStore) - Failed to enqueue messages
java.sql.SQLTransactionRollbackException: A lock could not be obtained due
to a deadlock, cycle of locks and waiters is:
Lock : TABLE, QPID_QUEUE_ENTRIES, Tablelock
  Waiting XID : {627716261, IX} , APP, INSERT INTO QPID_QUEUE_ENTRIES
(queue_id, message_id) values (?,?)
  Granted XID : {627715588, X}
Lock : ROW, SYSCONGLOMERATES, (7,6)
  Waiting XID : {627715588, X} , APP, alter table
"APP"."QPID_QUEUE_ENTRIES" compress
  Granted XID : {627715588, S} , {627716262, S} , {627716261, S}
. The selected victim is XID : 627716261.
at
org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:100)
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Util.java:230)
at
org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(TransactionResourceImpl.java:431)
at
org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(TransactionResourceImpl.java:360)
at
org.apache.derby.impl.jdbc.EmbedConnection.handleException(EmbedConnection.java:2405)
at
org.apache.derby.impl.jdbc.ConnectionChild.handleException(ConnectionChild.java:88)
at
org.apache.derby.impl.jdbc.EmbedPreparedStatement.<init>(EmbedPreparedStatement.java:152)
at
org.apache.derby.impl.jdbc.EmbedPreparedStatement42.<init>(EmbedPreparedStatement42.java:41)
at
org.apache.derby.iapi.jdbc.Driver42.newEmbedPreparedStatement(Driver42.java:56)
at
org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(EmbedConnection.java:1740)
at
org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(EmbedConnection.java:1568)
at
org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore.enqueueMessages(AbstractJDBCMessageStore.java:684)
at
org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore.access$700(AbstractJDBCMessageStore.java:81)
at
org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore$JDBCTransaction.lambda$new$0(AbstractJDBCMessageStore.java:1226)
at
org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore$JDBCTransaction.doPreCommitActions(AbstractJDBCMessageStore.java:1289)
at
org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore$JDBCTransaction.commitTranAsync(AbstractJDBCMessageStore.java:1278)
at
org.apache.qpid.server.txn.AsyncAutoCommitTransaction.enqueue(AsyncAutoCommitTransaction.java:294)
at org.apache.qpid.server.message.RoutingResult.send(RoutingResult.java:124)
at
org.apache.qpid.server.protocol.v1_0.NodeReceivingDestination.send(NodeReceivingDestination.java:117)

2022-09-30T20:49:12,632Z INFO  [RepackTimer]
(o.a.q.s.d.DerbyRepackingMessageStore) - Running select from lock table
2022-09-30T20:49:12,644Z INFO  [RepackTimer]
(o.a.q.s.d.DerbyRepackingMessageStore) - Locks are:
     627716294     TABLE     IX     QPID_MESSAGE_CONTENT      Tablelock
GRANT     T     2     null
     627716294     ROW        X     QPID_MESSAGE_METADATA     (118,8)
GRANT     T     1     null
     627716294     ROW        X     QPID_QUEUE_ENTRIES        (96,32)
GRANT     T     1     null
     627716294     TABLE     IX     QPID_QUEUE_ENTRIES        Tablelock
GRANT     T     2     null
     627716285     TABLE     IX     QPID_MESSAGE_METADATA     Tablelock
GRANT     T     2     null
     627716294     TABLE     IX     QPID_MESSAGE_METADATA     Tablelock
GRANT     T     2     null
     627716285     ROW        X     QPID_MESSAGE_METADATA     (150,13)
 GRANT     T     2     null
     627716294     ROW        X     QPID_MESSAGE_CONTENT      (81,85)
GRANT     T     1     null
2022-09-30T20:49:12,644Z INFO  [RepackTimer]
(o.a.q.s.d.DerbyRepackingMessageStore) - Compressing qpid_message_metadata

2022-09-30T20:49:32,682Z ERROR [IO-/147.18.139.173:44710]
(o.a.q.s.u.ServerScopedRuntimeException) - Exception on enqueuing message
into message store25904045
java.sql.SQLTransactionRollbackException: A lock could not be obtained due
to a deadlock, cycle of locks and waiters is:
Lock : TABLE, QPID_MESSAGE_METADATA, Tablelock
  Waiting XID : {627716335, IX} , APP, INSERT INTO QPID_MESSAGE_METADATA(
message_id , meta_data ) values (?, ?)
  Granted XID : {627716296, X}
Lock : ROW, SYSCONGLOMERATES, (7,8)
  Waiting XID : {627716296, X} , APP, alter table
"APP"."QPID_MESSAGE_METADATA" compress
  Granted XID : {627716296, S} , {627716335, S}
. The selected victim is XID : 627716335.
at
org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:100)
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Util.java:230)
at
org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(TransactionResourceImpl.java:431)
at
org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(TransactionResourceImpl.java:360)
at
org.apache.derby.impl.jdbc.EmbedConnection.handleException(EmbedConnection.java:2405)
at
org.apache.derby.impl.jdbc.ConnectionChild.handleException(ConnectionChild.java:88)
at
org.apache.derby.impl.jdbc.EmbedPreparedStatement.<init>(EmbedPreparedStatement.java:152)
at
org.apache.derby.impl.jdbc.EmbedPreparedStatement42.<init>(EmbedPreparedStatement42.java:41)
at
org.apache.derby.iapi.jdbc.Driver42.newEmbedPreparedStatement(Driver42.java:56)
at
org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(EmbedConnection.java:1740)
at
org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(EmbedConnection.java:1568)
at
org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore.storeMetaData(AbstractJDBCMessageStore.java:981)
at
org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore.access$1200(AbstractJDBCMessageStore.java:81)
at
org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore$StoredJDBCMessage.store(AbstractJDBCMessageStore.java:1635)
at
org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore$JDBCTransaction.lambda$enqueueMessage$1(AbstractJDBCMessageStore.java:1240)
at
org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore$JDBCTransaction.doPreCommitActions(AbstractJDBCMessageStore.java:1289)
at
org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore$JDBCTransaction.commitTranAsync(AbstractJDBCMessageStore.java:1278)
at
org.apache.qpid.server.txn.AsyncAutoCommitTransaction.enqueue(AsyncAutoCommitTransaction.java:294)
at org.apache.qpid.server.message.RoutingResult.send(RoutingResult.java:124)

On Fri, Sep 30, 2022 at 2:46 PM Rick Hillegas <[email protected]>
wrote:

> I don't have any theories yet about what is going on. Can you try
> issuing the following statement just before you compress the table:
>
> SELECT * FROM SYSCS_DIAG.LOCK_TABLE;
>
> I don't understand why the INSERT transaction grabs a lock on a
> SYSCONGLOMERATES row.
>
> Thanks,
> -Rick
>
>
> On 9/29/22 1:24 PM, Nathan Jensen wrote:
> > Thanks for the response!  To clarify a few things, we're using Derby
> > 10.15.2.0 and Java 11.  Our project is sticking with Java 11 for the time
> > being so I can't upgrade to Derby 10.16 at this time.  The version of
> Qpid
> > we're using at present is 7.1.12.
> >
> > I don't see anything suspicious regarding the INSERT statement, but maybe
> > you could see something that I don't.  I believe the relevant code is
> here:
> >
> https://github.com/apache/qpid-broker-j/blob/7.1.12/broker-plugins/jdbc-store/src/main/java/org/apache/qpid/server/store/jdbc/AbstractJDBCMessageStore.java#L675
> > That does not commit the transaction, the transaction is committed
> > elsewhere in the same file, here:
> >
> https://github.com/apache/qpid-broker-j/blob/7.1.12/broker-plugins/jdbc-store/src/main/java/org/apache/qpid/server/store/jdbc/AbstractJDBCMessageStore.java#L1264
> > I should also note that one time I saw the deadlock happen with a DELETE
> > statement instead of an INSERT, so I don't think the unexpected row
> locking
> > in SYSCONGLOMERATES is limited to INSERT.
> >
> > Here is the DDL as produced by Derby's dblook:
> > -- ----------------------------------------------
> > -- DDL Statements for tables
> > -- ----------------------------------------------
> >
> > CREATE TABLE "APP"."QPID_QUEUE_ENTRIES" ("QUEUE_ID" VARCHAR(36) NOT NULL,
> > "MESSAGE_ID" BIGINT NOT NULL);
> >
> > CREATE TABLE "APP"."QPID_MESSAGE_CONTENT" ("MESSAGE_ID" BIGINT NOT NULL,
> > "CONTENT" BLOB(2147483647));
> >
> > CREATE TABLE "APP"."QPID_XID_ACTIONS" ("FORMAT" BIGINT NOT NULL,
> > "GLOBAL_ID" VARCHAR (64) FOR BIT DATA NOT NULL, "BRANCH_ID" VARCHAR (64)
> > FOR BIT DATA NOT NULL, "ACTION_TYPE" CHAR(1) NOT NULL, "QUEUE_ID"
> > VARCHAR(36) NOT NULL, "MESSAGE_ID" BIGINT NOT NULL);
> >
> > CREATE TABLE "APP"."AMQP_1_0_LINKS" ("LINK_KEY" VARCHAR(44) NOT NULL,
> > "REMOTE_CONTAINER_ID" BLOB(2147483647), "LINK_NAME" BLOB(2147483647),
> > "LINK_ROLE" INTEGER, "SOURCE" BLOB(2147483647), "TARGET"
> BLOB(2147483647));
> >
> > CREATE TABLE "APP"."QPID_MESSAGE_METADATA" ("MESSAGE_ID" BIGINT NOT NULL,
> > "META_DATA" BLOB(2147483647));
> >
> > CREATE TABLE "APP"."QPID_DB_VERSION" ("VERSION" INTEGER NOT NULL);
> >
> > CREATE TABLE "APP"."QPID_XIDS" ("FORMAT" BIGINT NOT NULL, "GLOBAL_ID"
> > VARCHAR (64) FOR BIT DATA NOT NULL, "BRANCH_ID" VARCHAR (64) FOR BIT DATA
> > NOT NULL);
> >
> > CREATE TABLE "APP"."AMQP_1_0_LINKS_VERSION" ("VERSION" VARCHAR(10) NOT
> > NULL, "VERSION_TIME" TIMESTAMP);
> >
> > -- ----------------------------------------------
> > -- DDL Statements for keys
> > -- ----------------------------------------------
> >
> > -- PRIMARY/UNIQUE
> > ALTER TABLE "APP"."AMQP_1_0_LINKS" ADD CONSTRAINT
> > "SQL0000000086-b2280141-0183-5b33-aec6-000016c47528" PRIMARY KEY
> > ("LINK_KEY");
> >
> > ALTER TABLE "APP"."QPID_MESSAGE_METADATA" ADD CONSTRAINT
> > "SQL0000000082-e333810d-0183-5b33-aec6-000016c47528" PRIMARY KEY
> > ("MESSAGE_ID");
> >
> > ALTER TABLE "APP"."QPID_QUEUE_ENTRIES" ADD CONSTRAINT
> > "SQL0000000081-61228105-0183-5b33-aec6-000016c47528" PRIMARY KEY
> > ("QUEUE_ID", "MESSAGE_ID");
> >
> > ALTER TABLE "APP"."QPID_XID_ACTIONS" ADD CONSTRAINT
> > "SQL0000000085-69c68125-0183-5b33-aec6-000016c47528" PRIMARY KEY
> ("FORMAT",
> > "GLOBAL_ID", "BRANCH_ID", "ACTION_TYPE", "QUEUE_ID", "MESSAGE_ID");
> >
> > ALTER TABLE "APP"."QPID_XIDS" ADD CONSTRAINT
> > "SQL0000000084-e785811d-0183-5b33-aec6-000016c47528" PRIMARY KEY
> ("FORMAT",
> > "GLOBAL_ID", "BRANCH_ID");
> >
> > ALTER TABLE "APP"."QPID_MESSAGE_CONTENT" ADD CONSTRAINT
> > "SQL0000000083-65548115-0183-5b33-aec6-000016c47528" PRIMARY KEY
> > ("MESSAGE_ID");
> >
> > ALTER TABLE "APP"."AMQP_1_0_LINKS_VERSION" ADD CONSTRAINT
> > "SQL0000000087-5555c14b-0183-5b33-aec6-000016c47528" PRIMARY KEY
> > ("VERSION");
> >
> >
> > And just in case I'm doing something stupid, here is the current version
> of
> > my code.  conn.getAutoCommit() is false, but I also tried it with an
> > autocommit connection and without the commit statements that time.  The
> > code is inside a try-catch-finally that is inside a TimerTask.
> >
> > conn = newConnection();
> > LOGGER.info("connection.getAutoCommit() is " + conn.getAutoCommit());
> > LOGGER.info("Compressing qpid_queue_entries");
> > conn.prepareStatement("call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('APP',
> > 'QPID_QUEUE_ENTRIES', 0)").executeUpdate();
> > conn.commit();
> > LOGGER.info("Compressing qpid_message_metadata");
> > conn.prepareStatement("call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('APP',
> > 'QPID_MESSAGE_METADATA', 0)").executeUpdate();
> > conn.commit();
> > LOGGER.info("Compressing qpid_message_content");
> > conn.prepareStatement("call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('APP',
> > 'QPID_MESSAGE_CONTENT', 0)").executeUpdate();
> > conn.commit();
> >
> > Thanks for any help you can provide!
> >
> > On Thu, Sep 29, 2022 at 1:28 PM Rick Hillegas <[email protected]>
> > wrote:
> >
> >> I'm not an expert on the table compression code, but it does need an
> >> overhaul (see https://issues.apache.org/jira/browse/DERBY-3683).
> >>
> >> If I'm interpreting the error correctly, the table compressor is blocked
> >> trying to escalate its lock on the SYSCONGLOMERATES row from shared to
> >> exclusive. But it can't do this because the INSERT transaction holds a
> >> shared lock on the SYSCONGLOMERATES row. I don't understand why the
> INSERT
> >> transaction would need to lock a SYSCONGLOMERATES row. The following
> simple
> >> experiment shows that, by itself, a simple INSERT should not lock any
> rows
> >> in SYSCONGLOMERATES:
> >>
> >> ij version 10.17
> >>
> >> ij> CONNECT 'jdbc:derby:memory:db;create=true';
> >>
> >> ij> AUTOCOMMIT OFF;
> >>
> >> ij> CREATE TABLE t(a INT);
> >>
> >> 0 rows inserted/updated/deleted
> >>
> >> ij> CREATE INDEX t_idx ON t(a);
> >>
> >> 0 rows inserted/updated/deleted
> >>
> >> ij> CREATE TABLE s(a INT);
> >>
> >> 0 rows inserted/updated/deleted
> >>
> >> ij> INSERT INTO s VALUES (1);
> >>
> >> 1 row inserted/updated/deleted
> >>
> >> ij> COMMIT;
> >>
> >> ij> INSERT INTO t SELECT * FROM s;
> >>
> >> 1 row inserted/updated/deleted
> >>
> >> ij> SELECT * FROM SYSCS_DIAG.LOCK_TABLE;
> >>
> >> XID            |TYPE |MODE|TABLENAME
>
>        |LOCKNAME            |STATE|TABLETYPE|LOCK&|INDEXNAME
> >>
> >>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >>
> >> 186            |TABLE|IX  |T
>
>        |Tablelock           |GRANT|T        |2    |NULL
> >>
> >> 186            |ROW  |X   |T
>
>        |(1,7)               |GRANT|T        |1    |NULL
> >>
> >> 2 rows selected
> >>
> >> ij> COMMIT;
> >>
> >> ij> SELECT * FROM SYSCS_DIAG.LOCK_TABLE;
> >>
> >> XID            |TYPE |MODE|TABLENAME
>
>        |LOCKNAME            |STATE|TABLETYPE|LOCK&|INDEXNAME
> >>
> >>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >>
> >> 0 rows selected
> >>
> >>
> >> Is there anything else going on in the INSERT transaction? Can you share
> >> the table DDL with us, including indexes, constraints, auto-generated
> >> columns, etc.?
> >>
> >> -Rick
> >>
> >>
> >> On 9/28/22 2:12 PM, Nathan Jensen wrote:
> >>
> >> Hi, we are using Derby as the backing message store for Qpid-Broker-J.
> We
> >> have a continuous data flow into Qpid so each message gets persisted to
> the
> >> Derby message store as data comes in and then it is deleted from the
> Derby
> >> message store as data is picked up and processed.  The data flow is
> pretty
> >> constant, there are peaks and valleys but basically data is always
> coming
> >> in so there are LOTS of inserts and deletes.  Over time we found that
> the
> >> Derby message store fills up the disk drive until it runs out of space
> and
> >> in general the workaround to that has been to stop Qpid and remove the
> >> message store (i.e. the Derby directory), which loses data.
> >>
> >> My attempted solution to this was to extend Qpid's DerbyMessageStore
> with a
> >> DerbyRepackingMessageStore that executes the
> >> SYSCS_UTIL.SYSCS_COMPRESS_TABLE procedure based on a timer.  However,
> on a
> >> test system with flowing data, this always has a deadlock exception and
> >> some data (the aborted transaction) is lost.  An example is:
> >>
> >> Caused by: org.apache.derby.shared.common.error.StandardException: A
> lock
> >> could not be obtained due to a deadlock, cycle of locks and waiters is:
> >> Lock : TABLE, QPID_MESSAGE_METADATA, Tablelock
> >>    Waiting XID : {501214885, IX} , APP, INSERT INTO
> QPID_MESSAGE_METADATA(
> >> message_id , meta_data ) values (?, ?)
> >>    Granted XID : {501214829, X}
> >> Lock : ROW, SYSCONGLOMERATES, (7,8)
> >>    Waiting XID : {501214829, X} , APP, alter table
> >> "APP"."QPID_MESSAGE_METADATA" compress
> >>    Granted XID : {501214829, S} , {501214865, S} , {501214885, S}
> >> . The selected victim is XID : 501214885.
> >>          at
> >>
> org.apache.derby.shared.common.error.StandardException.newException(StandardException.java:300)
> >>          at
> >>
> org.apache.derby.shared.common.error.StandardException.newException(StandardException.java:295)
> >>          at
> >>
> org.apache.derby.impl.services.locks.Deadlock.buildException(Deadlock.java:557)
> >>          at
> >>
> org.apache.derby.impl.services.locks.ConcurrentLockSet.lockObject(ConcurrentLockSet.java:647)
> >>
> >> My understanding of that is that the deadlock is occurring because the
> >> insert has a shared lock on the sysconglomerates row and wants an
> exclusive
> >> lock on the qpid_message_metadata table, while the compress has an
> >> exclusive lock on the qpid_message_metadata table and wants an exclusive
> >> lock on the sysconglomerates row, so they block each other.
> >>
> >> I found this thread from May of 2009 which had a similar issue:
> https://lists.apache.org/thread/b8p7kzcc0clo45o2drs5rttsx5gqq49f  I would
> >> expect the compress command to go for the lock(s), wait until it gets
> the
> >> lock(s), then block the other threads and do the compress.  But it
> >> deadlocks instead.
> >>
> >> Based on this ticket, https://issues.apache.org/jira/browse/DERBY-3683,
> I
> >> tried using a non-autocommit connection and locking the table
> exclusively
> >> before calling compress and then commit, but that got the same errors.
> >> Based on this thread,
> https://lists.apache.org/thread/38qg7nkqolssc1jhdkhgnrl7vbkwq0vw, I tried
> >> the in place compress and it succeeds without deadlock.  However, it is
> >> significantly slower, too slow to be a workable solution for my use case
> >> and it also doesn't return as much space to the operating system.
> >>
> >> Does this look like a Derby bug where the compress command will
> deadlock if
> >> there are inserts happening at roughly the same time?  Or is this
> >> expected?  Do both the inserts and compress need a sysconglomerates row
> >> lock?  Is there a way the compress command could get both its locks at
> the
> >> same time to avoid the deadlock?
> >>
> >>
> >>
> >>
>
>

Re: deadlock when running SYSCS_UTIL.SYSCS_COMPRESS_TABLE

Reply via email to