Re: deadlock when running SYSCS_UTIL.SYSCS_COMPRESS_TABLE

Nathan Jensen Thu, 29 Sep 2022 13:26:07 -0700

Thanks for the response!  To clarify a few things, we're using Derby
10.15.2.0 and Java 11.  Our project is sticking with Java 11 for the time
being so I can't upgrade to Derby 10.16 at this time.  The version of Qpid
we're using at present is 7.1.12.


I don't see anything suspicious regarding the INSERT statement, but maybe
you could see something that I don't.  I believe the relevant code is here:
https://github.com/apache/qpid-broker-j/blob/7.1.12/broker-plugins/jdbc-store/src/main/java/org/apache/qpid/server/store/jdbc/AbstractJDBCMessageStore.java#L675
That does not commit the transaction, the transaction is committed
elsewhere in the same file, here:
https://github.com/apache/qpid-broker-j/blob/7.1.12/broker-plugins/jdbc-store/src/main/java/org/apache/qpid/server/store/jdbc/AbstractJDBCMessageStore.java#L1264
I should also note that one time I saw the deadlock happen with a DELETE
statement instead of an INSERT, so I don't think the unexpected row locking
in SYSCONGLOMERATES is limited to INSERT.

Here is the DDL as produced by Derby's dblook:
-- ----------------------------------------------
-- DDL Statements for tables
-- ----------------------------------------------

CREATE TABLE "APP"."QPID_QUEUE_ENTRIES" ("QUEUE_ID" VARCHAR(36) NOT NULL,
"MESSAGE_ID" BIGINT NOT NULL);

CREATE TABLE "APP"."QPID_MESSAGE_CONTENT" ("MESSAGE_ID" BIGINT NOT NULL,
"CONTENT" BLOB(2147483647));

CREATE TABLE "APP"."QPID_XID_ACTIONS" ("FORMAT" BIGINT NOT NULL,
"GLOBAL_ID" VARCHAR (64) FOR BIT DATA NOT NULL, "BRANCH_ID" VARCHAR (64)
FOR BIT DATA NOT NULL, "ACTION_TYPE" CHAR(1) NOT NULL, "QUEUE_ID"
VARCHAR(36) NOT NULL, "MESSAGE_ID" BIGINT NOT NULL);

CREATE TABLE "APP"."AMQP_1_0_LINKS" ("LINK_KEY" VARCHAR(44) NOT NULL,
"REMOTE_CONTAINER_ID" BLOB(2147483647), "LINK_NAME" BLOB(2147483647),
"LINK_ROLE" INTEGER, "SOURCE" BLOB(2147483647), "TARGET" BLOB(2147483647));

CREATE TABLE "APP"."QPID_MESSAGE_METADATA" ("MESSAGE_ID" BIGINT NOT NULL,
"META_DATA" BLOB(2147483647));

CREATE TABLE "APP"."QPID_DB_VERSION" ("VERSION" INTEGER NOT NULL);

CREATE TABLE "APP"."QPID_XIDS" ("FORMAT" BIGINT NOT NULL, "GLOBAL_ID"
VARCHAR (64) FOR BIT DATA NOT NULL, "BRANCH_ID" VARCHAR (64) FOR BIT DATA
NOT NULL);

CREATE TABLE "APP"."AMQP_1_0_LINKS_VERSION" ("VERSION" VARCHAR(10) NOT
NULL, "VERSION_TIME" TIMESTAMP);

-- ----------------------------------------------
-- DDL Statements for keys
-- ----------------------------------------------

-- PRIMARY/UNIQUE
ALTER TABLE "APP"."AMQP_1_0_LINKS" ADD CONSTRAINT
"SQL0000000086-b2280141-0183-5b33-aec6-000016c47528" PRIMARY KEY
("LINK_KEY");

ALTER TABLE "APP"."QPID_MESSAGE_METADATA" ADD CONSTRAINT
"SQL0000000082-e333810d-0183-5b33-aec6-000016c47528" PRIMARY KEY
("MESSAGE_ID");

ALTER TABLE "APP"."QPID_QUEUE_ENTRIES" ADD CONSTRAINT
"SQL0000000081-61228105-0183-5b33-aec6-000016c47528" PRIMARY KEY
("QUEUE_ID", "MESSAGE_ID");

ALTER TABLE "APP"."QPID_XID_ACTIONS" ADD CONSTRAINT
"SQL0000000085-69c68125-0183-5b33-aec6-000016c47528" PRIMARY KEY ("FORMAT",
"GLOBAL_ID", "BRANCH_ID", "ACTION_TYPE", "QUEUE_ID", "MESSAGE_ID");

ALTER TABLE "APP"."QPID_XIDS" ADD CONSTRAINT
"SQL0000000084-e785811d-0183-5b33-aec6-000016c47528" PRIMARY KEY ("FORMAT",
"GLOBAL_ID", "BRANCH_ID");

ALTER TABLE "APP"."QPID_MESSAGE_CONTENT" ADD CONSTRAINT
"SQL0000000083-65548115-0183-5b33-aec6-000016c47528" PRIMARY KEY
("MESSAGE_ID");

ALTER TABLE "APP"."AMQP_1_0_LINKS_VERSION" ADD CONSTRAINT
"SQL0000000087-5555c14b-0183-5b33-aec6-000016c47528" PRIMARY KEY
("VERSION");


And just in case I'm doing something stupid, here is the current version of
my code.  conn.getAutoCommit() is false, but I also tried it with an
autocommit connection and without the commit statements that time.  The
code is inside a try-catch-finally that is inside a TimerTask.

conn = newConnection();
LOGGER.info("connection.getAutoCommit() is " + conn.getAutoCommit());
LOGGER.info("Compressing qpid_queue_entries");
conn.prepareStatement("call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('APP',
'QPID_QUEUE_ENTRIES', 0)").executeUpdate();
conn.commit();
LOGGER.info("Compressing qpid_message_metadata");
conn.prepareStatement("call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('APP',
'QPID_MESSAGE_METADATA', 0)").executeUpdate();
conn.commit();
LOGGER.info("Compressing qpid_message_content");
conn.prepareStatement("call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('APP',
'QPID_MESSAGE_CONTENT', 0)").executeUpdate();
conn.commit();

Thanks for any help you can provide!

On Thu, Sep 29, 2022 at 1:28 PM Rick Hillegas <rick.hille...@gmail.com>
wrote:

> I'm not an expert on the table compression code, but it does need an
> overhaul (see https://issues.apache.org/jira/browse/DERBY-3683).
>
> If I'm interpreting the error correctly, the table compressor is blocked
> trying to escalate its lock on the SYSCONGLOMERATES row from shared to
> exclusive. But it can't do this because the INSERT transaction holds a
> shared lock on the SYSCONGLOMERATES row. I don't understand why the INSERT
> transaction would need to lock a SYSCONGLOMERATES row. The following simple
> experiment shows that, by itself, a simple INSERT should not lock any rows
> in SYSCONGLOMERATES:
>
> ij version 10.17
>
> ij> CONNECT 'jdbc:derby:memory:db;create=true';
>
> ij> AUTOCOMMIT OFF;
>
> ij> CREATE TABLE t(a INT);
>
> 0 rows inserted/updated/deleted
>
> ij> CREATE INDEX t_idx ON t(a);
>
> 0 rows inserted/updated/deleted
>
> ij> CREATE TABLE s(a INT);
>
> 0 rows inserted/updated/deleted
>
> ij> INSERT INTO s VALUES (1);
>
> 1 row inserted/updated/deleted
>
> ij> COMMIT;
>
> ij> INSERT INTO t SELECT * FROM s;
>
> 1 row inserted/updated/deleted
>
> ij> SELECT * FROM SYSCS_DIAG.LOCK_TABLE;
>
> XID            |TYPE |MODE|TABLENAME                                          
>                                                                              
> |LOCKNAME            |STATE|TABLETYPE|LOCK&|INDEXNAME
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> 186            |TABLE|IX  |T                                                  
>                                                                              
> |Tablelock           |GRANT|T        |2    |NULL
>
> 186            |ROW  |X   |T                                                  
>                                                                              
> |(1,7)               |GRANT|T        |1    |NULL
>
> 2 rows selected
>
> ij> COMMIT;
>
> ij> SELECT * FROM SYSCS_DIAG.LOCK_TABLE;
>
> XID            |TYPE |MODE|TABLENAME                                          
>                                                                              
> |LOCKNAME            |STATE|TABLETYPE|LOCK&|INDEXNAME
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> 0 rows selected
>
>
> Is there anything else going on in the INSERT transaction? Can you share
> the table DDL with us, including indexes, constraints, auto-generated
> columns, etc.?
>
> -Rick
>
>
> On 9/28/22 2:12 PM, Nathan Jensen wrote:
>
> Hi, we are using Derby as the backing message store for Qpid-Broker-J.  We
> have a continuous data flow into Qpid so each message gets persisted to the
> Derby message store as data comes in and then it is deleted from the Derby
> message store as data is picked up and processed.  The data flow is pretty
> constant, there are peaks and valleys but basically data is always coming
> in so there are LOTS of inserts and deletes.  Over time we found that the
> Derby message store fills up the disk drive until it runs out of space and
> in general the workaround to that has been to stop Qpid and remove the
> message store (i.e. the Derby directory), which loses data.
>
> My attempted solution to this was to extend Qpid's DerbyMessageStore with a
> DerbyRepackingMessageStore that executes the
> SYSCS_UTIL.SYSCS_COMPRESS_TABLE procedure based on a timer.  However, on a
> test system with flowing data, this always has a deadlock exception and
> some data (the aborted transaction) is lost.  An example is:
>
> Caused by: org.apache.derby.shared.common.error.StandardException: A lock
> could not be obtained due to a deadlock, cycle of locks and waiters is:
> Lock : TABLE, QPID_MESSAGE_METADATA, Tablelock
>   Waiting XID : {501214885, IX} , APP, INSERT INTO QPID_MESSAGE_METADATA(
> message_id , meta_data ) values (?, ?)
>   Granted XID : {501214829, X}
> Lock : ROW, SYSCONGLOMERATES, (7,8)
>   Waiting XID : {501214829, X} , APP, alter table
> "APP"."QPID_MESSAGE_METADATA" compress
>   Granted XID : {501214829, S} , {501214865, S} , {501214885, S}
> . The selected victim is XID : 501214885.
>         at
> org.apache.derby.shared.common.error.StandardException.newException(StandardException.java:300)
>         at
> org.apache.derby.shared.common.error.StandardException.newException(StandardException.java:295)
>         at
> org.apache.derby.impl.services.locks.Deadlock.buildException(Deadlock.java:557)
>         at
> org.apache.derby.impl.services.locks.ConcurrentLockSet.lockObject(ConcurrentLockSet.java:647)
>
> My understanding of that is that the deadlock is occurring because the
> insert has a shared lock on the sysconglomerates row and wants an exclusive
> lock on the qpid_message_metadata table, while the compress has an
> exclusive lock on the qpid_message_metadata table and wants an exclusive
> lock on the sysconglomerates row, so they block each other.
>
> I found this thread from May of 2009 which had a similar 
> issue:https://lists.apache.org/thread/b8p7kzcc0clo45o2drs5rttsx5gqq49f  I 
> would
> expect the compress command to go for the lock(s), wait until it gets the
> lock(s), then block the other threads and do the compress.  But it
> deadlocks instead.
>
> Based on this ticket, https://issues.apache.org/jira/browse/DERBY-3683, I
> tried using a non-autocommit connection and locking the table exclusively
> before calling compress and then commit, but that got the same errors.
> Based on this 
> thread,https://lists.apache.org/thread/38qg7nkqolssc1jhdkhgnrl7vbkwq0vw, I 
> tried
> the in place compress and it succeeds without deadlock.  However, it is
> significantly slower, too slow to be a workable solution for my use case
> and it also doesn't return as much space to the operating system.
>
> Does this look like a Derby bug where the compress command will deadlock if
> there are inserts happening at roughly the same time?  Or is this
> expected?  Do both the inserts and compress need a sysconglomerates row
> lock?  Is there a way the compress command could get both its locks at the
> same time to avoid the deadlock?
>
>
>
>

Re: deadlock when running SYSCS_UTIL.SYSCS_COMPRESS_TABLE

Reply via email to