[jira] [Commented] (ARTEMIS-1864) On-Demand Message Redistribution Can Spontaneously Start Failing in Single Direction

Alexander Kosarev (JIRA) Fri, 23 Nov 2018 02:13:18 -0800


    [ 
https://issues.apache.org/jira/browse/ARTEMIS-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16696622#comment-16696622
 ]


Alexander Kosarev commented on ARTEMIS-1864:
--------------------------------------------

We have the same issue with both Apache Artemis 2.6.3 and JBoss MQ 7 (build 
2.6.3.redhat-00004).

Cluster configuration:

2 nodes with configuration:

 
{code:java}
<configuration>

<connectors>
    <connector name="node01-connector">tcp://192.167.1.10:61616</connector>
</connectors>

<cluster-user>admin</cluster-user>
<cluster-password>admin</cluster-password>

<broadcast-groups>
    <broadcast-group name="my-broadcast-group">
        <group-address>${udp-address:231.7.7.7}</group-address>
        <group-port>9876</group-port>
        <broadcast-period>100</broadcast-period>
        <connector-ref>node02-connector</connector-ref>
    </broadcast-group>
</broadcast-groups>

<discovery-groups>
    <discovery-group name="my-discovery-group">
        <group-address>${udp-address:231.7.7.7}</group-address>
        <group-port>9876</group-port>
        <refresh-timeout>10000</refresh-timeout>
    </discovery-group>
</discovery-groups>

<cluster-connections>
    <cluster-connection name="sandbox-cluster">
        <connector-ref>node02-connector</connector-ref>
        <use-duplicate-detection>true</use-duplicate-detection>
        <max-hops>1</max-hops>
        <discovery-group-ref discovery-group-name="my-discovery-group"/>
    </cluster-connection>
</cluster-connections>

    <address-settings>
        <address-setting match="#">
            <redistribution-delay>0</redistribution-delay>
        </address-setting>
    </address-settings>

</configuration>

{code}
There are multiple ActiveMQ 5.15.6 JMS clients configured with 
failover-transport:

 
{code:java}
failover://(tcp://host1:port,tcp://host2:port){code}
All clients can consume and produce messages.

 

 

Messages stick in queue 
*$.artemis.internal.sf.sandbox-cluster.CLUSTER_NAME.NODE_NAME* at some point in 
"Delivering" state with following exception:

 
{code:java}
2018-11-23 14:36:25,274 WARN [org.apache.activemq.artemis.core.server] 
AMQ222151: removing consumer which did not handle a message, 
consumer=ClusterConnectionBridge@299a7489 
[name=$.artemis.internal.sf.sandbox-cluster.e0702bcf-e636-11e8-bca1-6aabda98944e,
 queue=QueueImpl
[name=$.artemis.internal.sf.sandbox-cluster.e0702bcf-e636-11e8-bca1-6aabda98944e,
 postOffice=PostOfficeImpl 
[server=ActiveMQServerImpl::serverUUID=b42fe4db-e636-11e8-b335-6aabda98944e], 
temp=false]@1584523b targetConnector=ServerLocatorImpl 
(identity=(Cluster-connection-b
ridge::ClusterConnectionBridge@299a7489 
[name=$.artemis.internal.sf.sandbox-cluster.e0702bcf-e636-11e8-bca1-6aabda98944e,
 
queue=QueueImpl[name=$.artemis.internal.sf.sandbox-cluster.e0702bcf-e636-11e8-bca1-6aabda98944e,
 postOffice=PostOfficeImpl [server=ActiveMQServerImpl:
:serverUUID=b42fe4db-e636-11e8-b335-6aabda98944e], temp=false]@1584523b 
targetConnector=ServerLocatorImpl 
[initialConnectors=[TransportConfiguration(name=node02-connector, 
factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory)
 ?port=61716&host
=10-145-13-120], 
discoveryGroupConfiguration=null]]::ClusterConnectionImpl@638169719[nodeUUID=b42fe4db-e636-11e8-b335-6aabda98944e,
 connector=TransportConfiguration(name=node01-connector, 
factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory)
?port=61616&host=10-145-13-120, address=, 
server=ActiveMQServerImpl::serverUUID=b42fe4db-e636-11e8-b335-6aabda98944e])) 
[initialConnectors=[TransportConfiguration(name=node02-connector, 
factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory)
 ?p
ort=61716&host=10-145-13-120], discoveryGroupConfiguration=null]], 
message=Reference[2151226563]:NON-RELIABLE:CoreMessage[messageID=2151226563,durable=false,userID=null,priority=0,
 timestamp=0,expiration=0, durable=false, 
address=ActiveMQ.Advisory.TempQueue,size=1077,prop
erties=TypedProperties[__HDR_BROKER_IN_TIME=1542965785270,_AMQ_ROUTING_TYPE=0,__HDR_GROUP_SEQUENCE=0,__HDR_COMMAND_ID=0,__HDR_DATASTRUCTURE=[0000
 0062 0800 0000 0000 0178 0100 2449 443A 616B 6F73 6172 6576 2D33 3933 ... 3535 
2D62 6236 352D 3966 3165 6361 3033 3861 3766
0100 0000 0000 0000 
0000),_AMQ_DUPL_ID=ID:akosarev-46097-1542964149858-1:1:0:0:21605,__HDR_MESSAGE_ID=[0000
 004A 6E00 017B 0100 2349 443A 616B 6F73 6172 6576 2D34 3630 3937 2D31 ... 0000 
0000 0000 0000 0000 0000 0000 0000 0000 0000 5465 0000 0000 0000 
0000),__HDR_DROPPA
BLE=false,__HDR_ARRIVAL=0,_AMQ_ROUTE_TO$.artemis.internal.sf.sandbox-cluster.e0702bcf-e636-11e8-bca1-6aabda98944e=[0000
 0000 0035 C0C3),bytesAsLongs(3522755],__HDR_PRODUCER_ID=[0000 0037 7B01 0023 
4944 3A61 6B6F 7361 7265 762D 3436 3039 372D 3135 3432 3936 3431 3439 3835
382D 313A 3100 0000 0000 0000 0000 0000 0000 0000 
00),JMSType=Advisory]]@1531727971: java.lang.IndexOutOfBoundsException: 
writerIndex: 4 (expected: readerIndex(0) <= writerIndex <= capacity(0))
 at io.netty.buffer.AbstractByteBuf.writerIndex(AbstractByteBuf.java:118) 
[netty-all-4.1.25.Final-redhat-00003.jar:4.1.25.Final-redhat-00003]
 at io.netty.buffer.WrappedByteBuf.writerIndex(WrappedByteBuf.java:129) 
[netty-all-4.1.25.Final-redhat-00003.jar:4.1.25.Final-redhat-00003]
 at 
org.apache.activemq.artemis.core.buffers.impl.ResetLimitWrappedActiveMQBuffer.writerIndex(ResetLimitWrappedActiveMQBuffer.java:128)
 [artemis-core-client-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at 
org.apache.activemq.artemis.core.buffers.impl.ResetLimitWrappedActiveMQBuffer.<init>(ResetLimitWrappedActiveMQBuffer.java:60)
 [artemis-core-client-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at 
org.apache.activemq.artemis.core.message.impl.CoreMessage.internalWritableBuffer(CoreMessage.java:367)
 [artemis-core-client-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at 
org.apache.activemq.artemis.core.message.impl.CoreMessage.getBodyBuffer(CoreMessage.java:360)
 [artemis-core-client-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at 
org.apache.activemq.artemis.core.client.impl.ClientProducerImpl.doSend(ClientProducerImpl.java:241)
 [artemis-core-client-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at 
org.apache.activemq.artemis.core.client.impl.ClientProducerImpl.send(ClientProducerImpl.java:128)
 [artemis-core-client-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at 
org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl.deliverStandardMessage(BridgeImpl.java:743)
 [artemis-server-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at 
org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl.handle(BridgeImpl.java:619)
 [artemis-server-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at 
org.apache.activemq.artemis.core.server.impl.QueueImpl.handle(QueueImpl.java:2983)
 [artemis-server-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at 
org.apache.activemq.artemis.core.server.impl.QueueImpl.deliver(QueueImpl.java:2334)
 [artemis-server-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at 
org.apache.activemq.artemis.core.server.impl.QueueImpl.access$2000(QueueImpl.java:107)
 [artemis-server-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at 
org.apache.activemq.artemis.core.server.impl.QueueImpl$DeliverRunner.run(QueueImpl.java:3209)
 [artemis-server-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
 [artemis-commons-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
 [artemis-commons-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at 
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66)
 [artemis-commons-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[rt.jar:1.8.0_181]
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[rt.jar:1.8.0_181]
 at 
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
 [artemis-commons-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]

2018-11-23 14:36:25,280 WARN 
[org.apache.activemq.artemis.core.server.impl.QueueImpl] null: 
java.util.NoSuchElementException
 at 
org.apache.activemq.artemis.utils.collections.PriorityLinkedListImpl$PriorityLinkedListIterator.repeat(PriorityLinkedListImpl.java:172)
 [artemis-commons-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at 
org.apache.activemq.artemis.core.server.impl.QueueImpl.deliver(QueueImpl.java:2353)
 [artemis-server-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at 
org.apache.activemq.artemis.core.server.impl.QueueImpl.access$2000(QueueImpl.java:107)
 [artemis-server-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at 
org.apache.activemq.artemis.core.server.impl.QueueImpl$DeliverRunner.run(QueueImpl.java:3209)
 [artemis-server-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
 [artemis-commons-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
 [artemis-commons-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at 
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66)
 [artemis-commons-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[rt.jar:1.8.0_181]
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[rt.jar:1.8.0_181]
 at 
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
 [artemis-commons-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
{code}
Message redistribution works only in one direction after that. For example if 
we have 2 nodes in cluster: *nodeA* and *nodeB*, and this problem appeared in 
node *nodeA*, then message redistribution will transfer messages from *nodeB* 
to *nodeA*, but not in reverse direction, because messages will be stuck in 
queue *$.artemis.internal.sf.sandbox-cluster.CLUSTER_NAME.NODE_NAME* on *nodeA*.

Restarting a cluster node with stuck messages resumes message redistribution.

Seems like this issue depends on message sending frequency. If we produce 10 
messages per second, then the issue appears earlier than in a minute after full 
restart of the cluster and clients.

 

Tested on:

OSs: CentOS 7, Ubuntu 18.04.1 LTS (both with libaio installed)

JDK: Oracle JDK 8, OpenJDK 8

> On-Demand Message Redistribution Can Spontaneously Start Failing in Single 
> Direction
> ------------------------------------------------------------------------------------
>
>                 Key: ARTEMIS-1864
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-1864
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 2.5.0
>         Environment: RHEL 6.2
>            Reporter: Ilkka Virolainen
>            Priority: Major
>
> It's possible that the message redistribution of an Artemis cluster can 
> spontaneously fail after running a while. I've witnessed this several times 
> using a two node colocated replicating cluster with a basic configuration:
> {code:java}
> <cluster-connections>
>    <cluster-connection name="my-cluster">
>       <connector-ref>netty-connector</connector-ref>
>       <retry-interval>500</retry-interval>
>       <reconnect-attempts>5</reconnect-attempts>
>       <use-duplicate-detection>true</use-duplicate-detection>
>       <message-load-balancing>ON_DEMAND</message-load-balancing>
>       <max-hops>1</max-hops>
>       <discovery-group-ref discovery-group-name="my-discovery-group"/>
>    </cluster-connection>
> </cluster-connections>{code}
> After running a while (approx. two weeks) one of the nodes (node a) will stop 
> consuming messages from the other node's (node b) internal store-and-forward 
> queue. This will result in message redistribution not working from node b -> 
> node a but will work from node a -> node b. The cause for this is unknown: 
> nothing of note is logged for either broker and JMX shows that the cluster 
> topology and the broker cluster bridge connection are intact. This will cause 
> significant problems, mainly:
> 1. Client communication will only work as expected if the clients happen to 
> connect to the right brokers
> 2. Unconsumed messages will end up piling in the internal store-and-forward 
> queue and consume unnecessary resources. It's also possible (but not 
> verified) that when messages in the internal queue expire, they leak memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARTEMIS-1864) On-Demand Message Redistribution Can Spontaneously Start Failing in Single Direction

Reply via email to