Daniel Hofer created AMQ-6197:
---------------------------------
Summary: Problem using 2 or more NetworkConnectors in a single
broker with NIO TransportConnectors
Key: AMQ-6197
URL: https://issues.apache.org/jira/browse/AMQ-6197
Project: ActiveMQ
Issue Type: Bug
Components: Broker
Affects Versions: 5.12.1
Environment: RHEL 6.6, java-openjdk-1.7.0 u95
Reporter: Daniel Hofer
In order to improve CPU usage in a test setup of a network of brokers
consisting of 3+ brokers using the following broker configuration
<broker useJmx="${activemq.expose.jmx}" persistent="false"
brokerName="${activemq.brokerName}"
xmlns="http://activemq.apache.org/schema/core">
<sslContext>
<amq:sslContext keyStore="${activemq.broker.keyStore}"
keyStorePassword="${activemq.broker.keyStorePassword}"
trustStore="${activemq.broker.trustStore}"
trustStorePassword="${activemq.broker.trustStorePassword}" />
</sslContext>
<systemUsage>
<systemUsage>
<memoryUsage>
<memoryUsage limit="${activemq.memoryUsage}" />
</memoryUsage>
<tempUsage>
<tempUsage limit="${activemq.tempUsage}" />
</tempUsage>
</systemUsage>
</systemUsage>
<destinationPolicy>
<policyMap>
<policyEntries>
<policyEntry queue=">" enableAudit="false">
<networkBridgeFilterFactory>
<conditionalNetworkBridgeFilterFactory
replayWhenNoConsumers="true" />
</networkBridgeFilterFactory>
</policyEntry>
</policyEntries>
</policyMap>
</destinationPolicy>
<networkConnectors>
<networkConnector name="queues"
uri="static:(${activemq.otherBrokers})"
networkTTL="2" dynamicOnly="true"
decreaseNetworkConsumerPriority="true"
conduitSubscriptions="false">
<excludedDestinations>
<topic physicalName=">" />
</excludedDestinations>
</networkConnector>
<networkConnector name="topics"
uri="static:(${activemq.otherBrokers})"
networkTTL="1" dynamicOnly="true"
decreaseNetworkConsumerPriority="true"
conduitSubscriptions="true">
<excludedDestinations>
<queue physicalName=">" />
</excludedDestinations>
</networkConnector>
</networkConnectors>
<transportConnectors>
<transportConnector
uri="${activemq.protocol}${activemq.host}:${activemq.tcp.port}?needClientAuth=true"
updateClusterClients="true" rebalanceClusterClients="true" />
<transportConnector
uri="${activemq.websocket.protocol}${activemq.websocket.host}:${activemq.websocket.port}?needClientAuth=true"
updateClusterClients="true" rebalanceClusterClients="true" />
</transportConnectors>
</broker>
we have altered the activemq.protocol placeholder from originally ssl:// to
nio+ssl:// and immediately could observe some CPU improvements (note the same
works with tcp:// and nio://). However after a new deployment we started to
encounter weird behavior that some producers would either get timeouts from
their request-reply messages or a "unknown destination exception" once the
reply is being sent on a temp-queue, the issue only happened when the producer
and consumer were connected to different brokers in the network. After some
testing we ultimately found out that after a restart often brokers would not
start both network bridges, one for queues and one for topics, but rather only
one of them. For example in a 3 broker setup each broker usually had 4 network
bridges active, 2 for each other broker. However during some restarts we would
see any number of active bridges between 2 and 4, no matter the wait time the
2nd bridge was never started. The logs also should no output whatsoever, as
long as 1 broker was shutdown the other two would output 'connection refused'
once he started the would show either 1 or 2 'successfully reconnected'.
As soon as we switched back to ssl:// protocol on the transport connector the
issue was gone for good, no matter how many restarts always 4 network bridges
would be started in each broker. Switching back to nio:// the problem is back
right away.
For now we are checking if it is worth it starting an additional
TransportConnector running nio:// just for producers and consumers while the
network used the tcp:// connector. The documentations regarding
NetworkConnectors usually all use tcp:// or multicast:// in the
TransportConnector the bridges attach to, so we are not entirely sure if nio://
is even supposed to work for this case or if this is indeed a bug somewhere.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)