[
https://issues.apache.org/jira/browse/ARTEMIS-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Justin Bertram resolved ARTEMIS-3174.
-------------------------------------
Fix Version/s: 2.18.0
Resolution: Fixed
> ServerSessionImpl doesn't set new connection when transferring session
> ----------------------------------------------------------------------
>
> Key: ARTEMIS-3174
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3174
> Project: ActiveMQ Artemis
> Issue Type: Bug
> Components: Broker
> Affects Versions: 2.16.0, 2.17.0
> Environment: We are using CORE connections between the broker and our
> clients.
> Our clients are using the JMS abstraction layer.
> The clients are using the following options:
> * confirmationWindowSize of 1024*1024 (=1MiB)
> * connectionTTL of 60 seconds
> * reconnectAttempts and initialConnectAttempts set to -1 (infinite retry)
>
> We found the issue with Artemis version 2.16.0, but I checked latest master
> and the bug still exists there.
> Reporter: Markus Meierhofer
> Priority: Major
> Fix For: 2.18.0
>
> Time Spent: 50m
> Remaining Estimate: 0h
>
> h3. How the issue was discovered
> Because of a different bug in the broker where sometimes, closing a
> connection didn't clean up the connected sessions
> (https://issues.apache.org/jira/browse/ARTEMIS-2870), we implemented a
> "monitor" process running alongside the broker which, via JMX remote
> management, checks the existing connection IDs and compares them with the
> connection IDs it retrieved from existing consumers/producers. When a
> connectionID of a consumer/producer isn't contained in the list of active
> connection IDs, the monitor process closed the session attached to the
> consumer/producer.
> At a customer site, we had the issue that our monitor process seemingly
> closed sessions of alive connections and not closed ones. This was due to the
> fact that, after a connection failover by the client, the consumer/producer
> JSON object retrieved via JMX still reported to be using the old connection
> ID although the consumer/producer was already using the new connection (when
> checking the logs of the artemis client after failover).
> The scenario is described down below. It shows the logs of the artemis client
> on the one hand and of our "monitor" process connected to the broker via JMX
> on the other hand, which every 10 seconds prints all connection IDs and the
> consumer/producerInfo retrieved.
> At 10:36:45, the artemis client recognized a connection failure:
> {noformat}
> [WARN 2021-03-09 10:36:51,522 l-threads)
> org.apache.activemq.artemis.core.client]: AMQ212037: Connection failure to
> fms/10.221.5.49:61616 has been detected: AMQ219014: Timed out after waiting
> 4,000 ms for response when sending packet 71
> [code=CONNECTION_TIMEDOUT]{noformat}
> On the monitor process, you can see that there are 3 connection IDs (b20b08e0
> is the old connection of the client which failed):
> {noformat}
> [DEBUG 2021-03-09 10:36:45,858 cheduler-1
> fms.jmx.ClearDeadBrokerSessionsScheduler]: Found connectionIDs: [b20b08e0,
> 8080f955, 5dc2dff9]
> Found consumers:
> ...,{"consumerID":0,"connectionID":"b20b08e0","sessionID":"237f52c1-8039-11eb-b5ae-2acf578dd699","queueName":"incubed.queue.robot_1000.robot_command","browseOnly":false,"creationTime":1615226846214,"deliveringCount":0},...
> {noformat}
> At 10:37:09 (25 seconds later), the client is able to successfully fail over.
> It seems that he is doing session reattachment instead of session recreate:
> {noformat}
> [ERROR 2021-03-09 10:37:09,307 ead-289224
> nt.control.RestartAgentExceptionListener]: JMS exception listener retrieved
> exception!
> {noformat}
> The monitor process now shows 4 connections (4cc50064 is the new connection
> of the client). Also note that there are still consumers (and their sessions)
> which report the old connection ID. But these sessions are actually used by
> the client (he has successfully transferred the sessions). We conclude this
> from the fact that there are NO consumers/producers with the new connection
> ID, but the client is able to communicate fully with the broker after
> successful failover. Also, the session IDs do not change (which indicates a
> session transfer):
> {noformat}
> [DEBUG 2021-03-09 10:37:15,900 cheduler-9
> fms.jmx.ClearDeadBrokerSessionsScheduler]: Found connectionIDs: [4cc50064,
> b20b08e0, 8080f955, 5dc2dff9]
> [DEBUG 2021-03-09 10:37:15,901 cheduler-9
> fms.jmx.ClearDeadBrokerSessionsScheduler]: Found
> consumers:...,{"consumerID":0,"connectionID":"b20b08e0","sessionID":"237f52c1-8039-11eb-b5ae-2acf578dd699","queueName":"incubed.queue.robot_1000.robot_command","browseOnly":false,"creationTime":1615226846214,"deliveringCount":0},...
> {noformat}
> ~60 seconds after the failure of the old connection, the broker closes the
> connection. Afterwards, the monitor process only shows 3 connection (b20b08e0
> was closed). But you can see that there are still consumers reporting the old
> connection ID (these are consumers of sessions that have actually been
> transferred to the new connection):
> {noformat}
> [DEBUG 2021-03-09 10:37:55,973 heduler-10
> fms.jmx.ClearDeadBrokerSessionsScheduler]: Found connectionIDs: [4cc50064,
> 8080f955, 5dc2dff9]
> [DEBUG 2021-03-09 10:37:55,973 heduler-10
> fms.jmx.ClearDeadBrokerSessionsScheduler]: Found consumers:
> ...,{"consumerID":0,"connectionID":"b20b08e0","sessionID":"237f52c1-8039-11eb-b5ae-2acf578dd699","queueName":"incubed.queue.robot_1000.robot_command","browseOnly":false,"creationTime":1615226846214,"deliveringCount":0},...
> {noformat}
> Afterwards, our monitor process goes on closing the sessions of the consumers
> with the old connection ID, although they were actually used by the new
> connection. This effectively breaks the connection of the artemis client.
> h3. The bug
> After looking through the code, we found a problem during session
> reattachment: In "internaltransferConnection" of
> "ServerSessionPacketHandler", the session is transferred from the old to the
> new connection. It seems mostly correct as the new connection is set both on
> the channel and on the ServerSessionPacketHandler itself. But on the
> "session" member variable (an instance of ServerSessionImpl), the new
> connection is not set, although ServerSessionImpl has a field
> "remotingConnection". In fact, the remotingConnectionField is only set once
> in the constructor, which cannot be correct if the ServerSessionImpl object
> is reused from the old to the new connection.
> Among other usages, the remotingConnection field of ServerSessionImpl is used
> to retrieve the connectionID of the session's connection, which in the end is
> used by the JMX server control (ActiveMQServerControlImpl) in order to build
> a JSON representation of a consumer.
> h3. A simple solution
> The most direct solution would be to set the new connection on the
> remotingConnection field of ServerSessionImpl. I will soon post a pull
> request here containing this change.
> There are some considerations though:
> - Do we need to care about locking when setting the field?
> - The "remotingConnection" field is currently final in ServerSessionImpl, and
> always was since the initial HornetQ donation, I'm not sure what the
> intention behind it was. But on the other hand, the session reattachment also
> worked the same way, so the bug of keeping the old connection on the
> ServerSessionImpl also existed back then.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)