[jira] [Comment Edited] (ARTEMIS-2586) Inifinite Block in AMQ212054 after transient DB-error

Rico Neubauer (Jira) Sat, 04 Jan 2020 02:40:38 -0800


    [ 
https://issues.apache.org/jira/browse/ARTEMIS-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007974#comment-17007974
 ]


Rico Neubauer edited comment on ARTEMIS-2586 at 1/4/20 10:39 AM:
-----------------------------------------------------------------

Config attached now - sorry, missed it.

Just used "AMQP" as component since, the logged error has it in its name. You 
are right: we are using the core protocol.

The exceptions thrown originate in the JDBC-driver's networ layer, while the 
MessageReceiverBase#onMessage tries to commit. This exception is handled by 
MessageReceiverBase and message is tried to be sent to DLQ.
While looking at it now, maybe this handling leads to the credit exhaustion, 
since the handling is still part of our #onMessage code, i.e. there is not yet 
a return/exception back to JMS-server done, when the blocking send is in 
progress:

e.g.
{noformat}
"Thread-83 (ActiveMQ-client-global-threads)" Id=1727 in TIMED_WAITING on 
lock=java.util.concurrent.Semaphore$NonfairSync@85eb557
    at sun.misc.Unsafe.park(Native Method)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
    at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1039)
    at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1332)
    at java.util.concurrent.Semaphore.tryAcquire(Semaphore.java:582)
    at 
org.apache.activemq.artemis.core.client.impl.ClientProducerCreditsImpl.actualAcquire(ClientProducerCreditsImpl.java:73)
    at 
org.apache.activemq.artemis.core.client.impl.AbstractProducerCreditsImpl.acquireCredits(AbstractProducerCreditsImpl.java:77)
    at 
org.apache.activemq.artemis.core.client.impl.ClientProducerImpl.sendRegularMessage(ClientProducerImpl.java:301)
    at 
org.apache.activemq.artemis.core.client.impl.ClientProducerImpl.doSend(ClientProducerImpl.java:275)
    at 
org.apache.activemq.artemis.core.client.impl.ClientProducerImpl.send(ClientProducerImpl.java:128)
    at 
org.apache.activemq.artemis.jms.client.ActiveMQMessageProducer.doSendx(ActiveMQMessageProducer.java:485)
    at 
org.apache.activemq.artemis.jms.client.ActiveMQMessageProducer.send(ActiveMQMessageProducer.java:195)
    at 
com.company.engine.jms.MessageReceiverBase.sendToDLQ(MessageReceiverBase.java:571)
    at 
com.company.engine.jms.MessageReceiverBase.handleException(MessageReceiverBase.java:493)
    at 
com.company.engine.jms.MessageReceiverBase.onMessage(MessageReceiverBase.java:387){noformat}


was (Author: riconeubauer):
Config attached now - sorry, missed it.

Just used "AMQP" as component since, the logged error has it in its name. You 
are right: we are using the core protocol.

The exceptions thrown originate in the JDBC-driver's networ layer, while the 
MessageReceiverBase#onMessage tries to commit. This exception is handled by 
MessageReceiverBase and message is tried to be sent to DLQ, while looking at 
it, maybe this handling leads to the credit exhaustion, since the handling is 
still part of our #onMessage code, i.e. there is not yet a return/exception 
back to JMS-server done, when the blocking send is in progress:

e.g.
{noformat}
"Thread-83 (ActiveMQ-client-global-threads)" Id=1727 in TIMED_WAITING on 
lock=java.util.concurrent.Semaphore$NonfairSync@85eb557
    at sun.misc.Unsafe.park(Native Method)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
    at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1039)
    at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1332)
    at java.util.concurrent.Semaphore.tryAcquire(Semaphore.java:582)
    at 
org.apache.activemq.artemis.core.client.impl.ClientProducerCreditsImpl.actualAcquire(ClientProducerCreditsImpl.java:73)
    at 
org.apache.activemq.artemis.core.client.impl.AbstractProducerCreditsImpl.acquireCredits(AbstractProducerCreditsImpl.java:77)
    at 
org.apache.activemq.artemis.core.client.impl.ClientProducerImpl.sendRegularMessage(ClientProducerImpl.java:301)
    at 
org.apache.activemq.artemis.core.client.impl.ClientProducerImpl.doSend(ClientProducerImpl.java:275)
    at 
org.apache.activemq.artemis.core.client.impl.ClientProducerImpl.send(ClientProducerImpl.java:128)
    at 
org.apache.activemq.artemis.jms.client.ActiveMQMessageProducer.doSendx(ActiveMQMessageProducer.java:485)
    at 
org.apache.activemq.artemis.jms.client.ActiveMQMessageProducer.send(ActiveMQMessageProducer.java:195)
    at 
com.company.engine.jms.MessageReceiverBase.sendToDLQ(MessageReceiverBase.java:571)
    at 
com.company.engine.jms.MessageReceiverBase.handleException(MessageReceiverBase.java:493)
    at 
com.company.engine.jms.MessageReceiverBase.onMessage(MessageReceiverBase.java:387){noformat}

> Inifinite Block in AMQ212054 after transient DB-error
> -----------------------------------------------------
>
>                 Key: ARTEMIS-2586
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-2586
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: AMQP
>    Affects Versions: 2.10.1
>         Environment: This is Ubuntu 18.04 and Oracle DB, but don't think it's 
> that relevant for the issue.
>            Reporter: Rico Neubauer
>            Priority: Major
>         Attachments: 2019-11-28_threaddump_01.txt, 
> 2019-12-04_threaddump_01.txt, Message-Counts.png, artemis.xml, 
> initial-error.txt, log-extract.txt, writerIndex-Credits.PNG
>
>
> Hi,
> Would like to describe a quite severe situation which was expirienced in a 
> long-running test with 2 out of 3 instances/machines.
> We are running Karaf with Artemis 2.10.1.
> After some time (see screenshot), first one, then after a while a 2nd 
> instance came to a complete stop.
> Looking into the logs and thread-dumps revealed the following (same for bith 
> instances):
>  # There was a temporary problem connecting to the DB ({{connection reset by 
> peer}}and {{Closed Connection }})
>  # This resulted (due to handling on our side) in an 
> {{IllegalStateException}}/{{Error during two phase commit}} being thrown back 
> to Artemis.
>  # After this, there is no messaging possible anymore at all and the 
> following log repeats:
> {noformat}
> AMQ212054: Destination address=DLQ is blocked. If the system is configured to 
> block make sure you consume messages on this configuration.{noformat}
> (system is not configured to block, see attached config)
>  which comes from threads like these, trying to obtain credits for sending:
>  
> {noformat}
> "Thread-93 (ActiveMQ-client-global-threads)" Id=2001 in TIMED_WAITING on 
> lock=java.util.concurrent.Semaphore$NonfairSync@1f9a57e0
>  at sun.misc.Unsafe.park(Native Method)
>  at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1039)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1332)
>  at java.util.concurrent.Semaphore.tryAcquire(Semaphore.java:582)
>  at 
> org.apache.activemq.artemis.core.client.impl.ClientProducerCreditsImpl.actualAcquire(ClientProducerCreditsImpl.java:73)
>  at 
> org.apache.activemq.artemis.core.client.impl.AbstractProducerCreditsImpl.acquireCredits(AbstractProducerCreditsImpl.java:77)
>  at 
> org.apache.activemq.artemis.core.client.impl.ClientProducerImpl.sendRegularMessage(ClientProducerImpl.java:301)
>  at 
> org.apache.activemq.artemis.core.client.impl.ClientProducerImpl.doSend(ClientProducerImpl.java:275)
>  at 
> org.apache.activemq.artemis.core.client.impl.ClientProducerImpl.send(ClientProducerImpl.java:128)
>  at 
> org.apache.activemq.artemis.jms.client.ActiveMQMessageProducer.doSendx(ActiveMQMessageProducer.java:485)
>  at 
> org.apache.activemq.artemis.jms.client.ActiveMQMessageProducer.send(ActiveMQMessageProducer.java:195)
>  at 
> com.seeburger.engine.jms.MessageReceiverBase.sendToDLQ(MessageReceiverBase.java:571)
>  at 
> com.seeburger.engine.jms.MessageReceiverBase.handleException(MessageReceiverBase.java:493)
>  at 
> com.seeburger.engine.jms.MessageReceiverBase.onMessage(MessageReceiverBase.java:387)
>  at 
> org.apache.activemq.artemis.jms.client.JMSMessageListenerWrapper.onMessage(JMSMessageListenerWrapper.java:110)
>  at 
> org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl.callOnMessage(ClientConsumerImpl.java:1031)
>  at 
> org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl.access$400(ClientConsumerImpl.java:50)
>  at 
> org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl$Runner.run(ClientConsumerImpl.java:1154)
>  at 
> org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
>  at 
> org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
>  at 
> org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66)
>  at 
> org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$431/1769898766.run(Unknown
>  Source)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at 
> org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
> Locked synchronizers: count = 1
>  - java.util.concurrent.ThreadPoolExecutor$Worker@bc49fcf
> {noformat}
> which will never succeed, since the credits seem to no suffice (see heap-dump 
> screenshot)
> From my point of view, the thrown IllegalStateException should not lead to 
> the system going in this non-recoverable state, what do you think, is there 
> something that can be enhanced?
>  
> [Fastthread-Link|https://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMjAvMDEvMy8tLTIwMTktMTItMDRfdGhyZWFkZHVtcF8wMS50eHQtLTEzLTM4LTE1OzstLTIwMTktMTEtMjhfdGhyZWFkZHVtcF8wMS50eHQtLTEzLTM4LTE1]
> In case it helps: The 2 instances are still in this state (since September) 
> and I can fetch additional information or debug them on request.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARTEMIS-2586) Inifinite Block in AMQ212054 after transient DB-error

Reply via email to