Tomas Hofman created ARTEMIS-3037:
-------------------------------------

             Summary: JournalImpl#checkKnownRecordID() implementation can leave 
a thread hanging in WAITING state
                 Key: ARTEMIS-3037
                 URL: https://issues.apache.org/jira/browse/ARTEMIS-3037
             Project: ActiveMQ Artemis
          Issue Type: Bug
          Components: Broker
    Affects Versions: 2.16.0, 2.9.0
            Reporter: Tomas Hofman


The {{JournalImpl#checkKnownRecordID()}} implementation contains following code:

{code}
      final SimpleFuture<Boolean> known = new SimpleFutureImpl<>();

      // retry on the append thread. maybe the appender thread is not keeping 
up.
      appendExecutor.execute(new Runnable() {
         @Override
         public void run() {
            journalLock.readLock().lock();
            try {

               known.set(records.containsKey(id)
                  || pendingRecords.contains(id)
                  || (compactor != null && compactor.containsRecord(id)));
            } finally {
               journalLock.readLock().unlock();
            }
         }
      });

      if (!known.get()) {
          ...
      }
{code}

If the code in the Runnable fails with exception before the {{known}} future 
value is set, the main thread would be left in the WAITING state forever. 
Exception handling should be added that would cancel the future in case of 
exception.

We've observed cases where following threads were left hanging, while no other 
threads operating inside JournalImpl were present. I believe that 
{{JournalImpl#checkKnownRecordID()}} implementation may be responsible for that:

{code}
"Thread-16 
(ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@423fe5c3)"
 #1078 prio=5 os_prio=64 tid=0x000000011c34a000 nid=0x4eb waiting on condition 
[0xfffffffabe9ad000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0xfffffffbe73c29e8> (a 
java.util.concurrent.CountDownLatch$Sync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
        at 
org.apache.activemq.artemis.utils.SimpleFutureImpl.get(SimpleFutureImpl.java:62)
        at 
org.apache.activemq.artemis.core.journal.impl.JournalImpl.checkKnownRecordID(JournalImpl.java:1080)
        at 
org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRecord(JournalImpl.java:950)
        at 
org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.confirmPendingLargeMessage(AbstractJournalStorageManager.java:361)
        at 
org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.confirmLargeMessageSend(PostOfficeImpl.java:1390)
        - locked <0xfffffffbe73aa1b0> (a 
org.apache.activemq.artemis.core.persistence.impl.journal.LargeServerMessageImpl)
        at 
org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.processRoute(PostOfficeImpl.java:1336)
        at 
org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.route(PostOfficeImpl.java:980)
        at 
org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.route(PostOfficeImpl.java:871)
        at 
org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.doSend(ServerSessionImpl.java:2045)
        - locked <0xfffffffb19447fb8> (a 
org.apache.activemq.artemis.core.server.impl.ServerSessionImpl)
        at 
org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.doSend(ServerSessionImpl.java:1989)
        - locked <0xfffffffb19447fb8> (a 
org.apache.activemq.artemis.core.server.impl.ServerSessionImpl)
        at 
org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler.sendContinuations(ServerSessionPacketHandler.java:1034)
        - locked <0xfffffffb1962b900> (a java.lang.Object)
        at 
org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler.slowPacketHandler(ServerSessionPacketHandler.java:312)
        at 
org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler.onMessagePacket(ServerSessionPacketHandler.java:285)
        at 
org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler$$Lambda$651/2097400985.onMessage(Unknown
 Source)
        at org.apache.activemq.artemis.utils.actors.Actor.doTask(Actor.java:33)
        at 
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66)
        at 
org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$413/494003142.run(Unknown
 Source)
        at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
        at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
        at 
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66)
        at 
org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$413/494003142.run(Unknown
 Source)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java)
        at 
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)

   Locked ownable synchronizers:
        - <0xfffffffba1800ca0> (a 
java.util.concurrent.ThreadPoolExecutor$Worker)
{code}

{code}
"Thread-82 
(ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$7@3bde9e44)"
 #2130 prio=5 os_prio=64 tid=0x000000017b6df800 nid=0x907 waiting for monitor 
entry [0xffffffff045de000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at 
org.apache.activemq.artemis.core.persistence.impl.journal.LargeServerMessageImpl.getEncodeSize(LargeServerMessageImpl.java:178)
        - waiting to lock <0xfffffffbe73aa1b0> (a 
org.apache.activemq.artemis.core.persistence.impl.journal.LargeServerMessageImpl)
        at 
org.apache.activemq.artemis.core.persistence.impl.journal.codec.LargeMessagePersister.getEncodeSize(LargeMessagePersister.java:59)
        at 
org.apache.activemq.artemis.core.persistence.impl.journal.codec.LargeMessagePersister.getEncodeSize(LargeMessagePersister.java:25)
        at 
org.apache.activemq.artemis.core.journal.impl.dataformat.JournalAddRecord.getEncodeSize(JournalAddRecord.java:79)
        at 
org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendRecord(JournalImpl.java:2792)
        at 
org.apache.activemq.artemis.core.journal.impl.JournalImpl.access$100(JournalImpl.java:91)
        at 
org.apache.activemq.artemis.core.journal.impl.JournalImpl$1.run(JournalImpl.java:850)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to