A wild guess: if the INSERT transaction doesn't commit immediately,
but rather inserts multiple rows in a group before committing them
all, then maybe the transaction gets too large, and then once there
have been enough row locks taken by inserting the individual rows, the
database is switching from row locking to table locking, and that's
what causes the deadlock.

As an experiment, try committing the INSERT transaction more
frequently, even (as an extreme) committing after every row inserted,
and see if that avoids the deadlock.

Switching the discussion slightly, one technique that people often use
instead of trying to re-use the same table by deleting rows and
compressing in a way that keeps pace with the insert load, is to use
multiple tables, and have the insert switch to an alternate table
after some time, then the cleaning program can delete the old rows and
compress, and then the two tasks and flip back.

So maybe you could enhance QPID's DerbyMessageStore so that it uses a
collection of tables, not just a single table, so that it can arrange
to have the compress work operating on a different table than the
insert work.

bryan

On Thu, Sep 29, 2022 at 11:28 AM Rick Hillegas <rick.hille...@gmail.com> wrote:
>
> I'm not an expert on the table compression code, but it does need an overhaul 
> (see https://issues.apache.org/jira/browse/DERBY-3683).
>
> If I'm interpreting the error correctly, the table compressor is blocked 
> trying to escalate its lock on the SYSCONGLOMERATES row from shared to 
> exclusive. But it can't do this because the INSERT transaction holds a shared 
> lock on the SYSCONGLOMERATES row. I don't understand why the INSERT 
> transaction would need to lock a SYSCONGLOMERATES row. The following simple 
> experiment shows that, by itself, a simple INSERT should not lock any rows in 
> SYSCONGLOMERATES:
>
> ij version 10.17
>
> ij> CONNECT 'jdbc:derby:memory:db;create=true';
>
> ij> AUTOCOMMIT OFF;
>
> ij> CREATE TABLE t(a INT);
>
> 0 rows inserted/updated/deleted
>
> ij> CREATE INDEX t_idx ON t(a);
>
> 0 rows inserted/updated/deleted
>
> ij> CREATE TABLE s(a INT);
>
> 0 rows inserted/updated/deleted
>
> ij> INSERT INTO s VALUES (1);
>
> 1 row inserted/updated/deleted
>
> ij> COMMIT;
>
> ij> INSERT INTO t SELECT * FROM s;
>
> 1 row inserted/updated/deleted
>
> ij> SELECT * FROM SYSCS_DIAG.LOCK_TABLE;
>
> XID            |TYPE |MODE|TABLENAME                                          
>                                                                              
> |LOCKNAME            |STATE|TABLETYPE|LOCK&|INDEXNAME
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> 186            |TABLE|IX  |T                                                  
>                                                                              
> |Tablelock           |GRANT|T        |2    |NULL
>
> 186            |ROW  |X   |T                                                  
>                                                                              
> |(1,7)               |GRANT|T        |1    |NULL
>
> 2 rows selected
>
> ij> COMMIT;
>
> ij> SELECT * FROM SYSCS_DIAG.LOCK_TABLE;
>
> XID            |TYPE |MODE|TABLENAME                                          
>                                                                              
> |LOCKNAME            |STATE|TABLETYPE|LOCK&|INDEXNAME
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> 0 rows selected
>
>
> Is there anything else going on in the INSERT transaction? Can you share the 
> table DDL with us, including indexes, constraints, auto-generated columns, 
> etc.?
>
> -Rick
>
>
> On 9/28/22 2:12 PM, Nathan Jensen wrote:
>
> Hi, we are using Derby as the backing message store for Qpid-Broker-J.  We
> have a continuous data flow into Qpid so each message gets persisted to the
> Derby message store as data comes in and then it is deleted from the Derby
> message store as data is picked up and processed.  The data flow is pretty
> constant, there are peaks and valleys but basically data is always coming
> in so there are LOTS of inserts and deletes.  Over time we found that the
> Derby message store fills up the disk drive until it runs out of space and
> in general the workaround to that has been to stop Qpid and remove the
> message store (i.e. the Derby directory), which loses data.
>
> My attempted solution to this was to extend Qpid's DerbyMessageStore with a
> DerbyRepackingMessageStore that executes the
> SYSCS_UTIL.SYSCS_COMPRESS_TABLE procedure based on a timer.  However, on a
> test system with flowing data, this always has a deadlock exception and
> some data (the aborted transaction) is lost.  An example is:
>
> Caused by: org.apache.derby.shared.common.error.StandardException: A lock
> could not be obtained due to a deadlock, cycle of locks and waiters is:
> Lock : TABLE, QPID_MESSAGE_METADATA, Tablelock
>   Waiting XID : {501214885, IX} , APP, INSERT INTO QPID_MESSAGE_METADATA(
> message_id , meta_data ) values (?, ?)
>   Granted XID : {501214829, X}
> Lock : ROW, SYSCONGLOMERATES, (7,8)
>   Waiting XID : {501214829, X} , APP, alter table
> "APP"."QPID_MESSAGE_METADATA" compress
>   Granted XID : {501214829, S} , {501214865, S} , {501214885, S}
> . The selected victim is XID : 501214885.
>         at
> org.apache.derby.shared.common.error.StandardException.newException(StandardException.java:300)
>         at
> org.apache.derby.shared.common.error.StandardException.newException(StandardException.java:295)
>         at
> org.apache.derby.impl.services.locks.Deadlock.buildException(Deadlock.java:557)
>         at
> org.apache.derby.impl.services.locks.ConcurrentLockSet.lockObject(ConcurrentLockSet.java:647)
>
> My understanding of that is that the deadlock is occurring because the
> insert has a shared lock on the sysconglomerates row and wants an exclusive
> lock on the qpid_message_metadata table, while the compress has an
> exclusive lock on the qpid_message_metadata table and wants an exclusive
> lock on the sysconglomerates row, so they block each other.
>
> I found this thread from May of 2009 which had a similar issue:
> https://lists.apache.org/thread/b8p7kzcc0clo45o2drs5rttsx5gqq49f  I would
> expect the compress command to go for the lock(s), wait until it gets the
> lock(s), then block the other threads and do the compress.  But it
> deadlocks instead.
>
> Based on this ticket, https://issues.apache.org/jira/browse/DERBY-3683, I
> tried using a non-autocommit connection and locking the table exclusively
> before calling compress and then commit, but that got the same errors.
> Based on this thread,
> https://lists.apache.org/thread/38qg7nkqolssc1jhdkhgnrl7vbkwq0vw, I tried
> the in place compress and it succeeds without deadlock.  However, it is
> significantly slower, too slow to be a workable solution for my use case
> and it also doesn't return as much space to the operating system.
>
> Does this look like a Derby bug where the compress command will deadlock if
> there are inserts happening at roughly the same time?  Or is this
> expected?  Do both the inserts and compress need a sysconglomerates row
> lock?  Is there a way the compress command could get both its locks at the
> same time to avoid the deadlock?
>
>

Reply via email to