[
https://issues.apache.org/jira/browse/ARTEMIS-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17303645#comment-17303645
]
ASF subversion and git services commented on ARTEMIS-3174:
----------------------------------------------------------
Commit 3b9008bda008ee8e7413c05c5c6238503d9a3bcc in activemq-artemis's branch
refs/heads/master from Markus Meierhofer
[ https://gitbox.apache.org/repos/asf?p=activemq-artemis.git;h=3b9008b ]
ARTEMIS-3174: Set new connection on ServerSession during reattachment
During session reattachment, also set the new connection on the
"session" member of the ServerSessionPacketHandler. Until now,
the connected ServerSessionImpl instance still referenced the old
connection although it had already been transferred on the new connection.
> ServerSessionImpl doesn't set new connection when transferring session
> ----------------------------------------------------------------------
>
> Key: ARTEMIS-3174
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3174
> Project: ActiveMQ Artemis
> Issue Type: Bug
> Components: Broker
> Affects Versions: 2.16.0, 2.17.0
> Environment: We are using CORE connections between the broker and our
> clients.
> Our clients are using the JMS abstraction layer.
> The clients are using the following options:
> * confirmationWindowSize of 1024*1024 (=1MiB)
> * connectionTTL of 60 seconds
> * reconnectAttempts and initialConnectAttempts set to -1 (infinite retry)
>
> We found the issue with Artemis version 2.16.0, but I checked latest master
> and the bug still exists there.
> Reporter: Markus Meierhofer
> Priority: Major
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> h3. How the issue was discovered
> Because of a different bug in the broker where sometimes, closing a
> connection didn't clean up the connected sessions
> (https://issues.apache.org/jira/browse/ARTEMIS-2870), we implemented a
> "monitor" process running alongside the broker which, via JMX remote
> management, checks the existing connection IDs and compares them with the
> connection IDs it retrieved from existing consumers/producers. When a
> connectionID of a consumer/producer isn't contained in the list of active
> connection IDs, the monitor process closed the session attached to the
> consumer/producer.
> At a customer site, we had the issue that our monitor process seemingly
> closed sessions of alive connections and not closed ones. This was due to the
> fact that, after a connection failover by the client, the consumer/producer
> JSON object retrieved via JMX still reported to be using the old connection
> ID although the consumer/producer was already using the new connection (when
> checking the logs of the artemis client after failover).
> The scenario is described down below. It shows the logs of the artemis client
> on the one hand and of our "monitor" process connected to the broker via JMX
> on the other hand, which every 10 seconds prints all connection IDs and the
> consumer/producerInfo retrieved.
> At 10:36:45, the artemis client recognized a connection failure:
> {noformat}
> [WARN 2021-03-09 10:36:51,522 l-threads)
> org.apache.activemq.artemis.core.client]: AMQ212037: Connection failure to
> fms/10.221.5.49:61616 has been detected: AMQ219014: Timed out after waiting
> 4,000 ms for response when sending packet 71
> [code=CONNECTION_TIMEDOUT]{noformat}
> On the monitor process, you can see that there are 3 connection IDs (b20b08e0
> is the old connection of the client which failed):
> {noformat}
> [DEBUG 2021-03-09 10:36:45,858 cheduler-1
> fms.jmx.ClearDeadBrokerSessionsScheduler]: Found connectionIDs: [b20b08e0,
> 8080f955, 5dc2dff9]
> Found consumers:
> ...,{"consumerID":0,"connectionID":"b20b08e0","sessionID":"237f52c1-8039-11eb-b5ae-2acf578dd699","queueName":"incubed.queue.robot_1000.robot_command","browseOnly":false,"creationTime":1615226846214,"deliveringCount":0},...
> {noformat}
> At 10:37:09 (25 seconds later), the client is able to successfully fail over.
> It seems that he is doing session reattachment instead of session recreate:
> {noformat}
> [ERROR 2021-03-09 10:37:09,307 ead-289224
> nt.control.RestartAgentExceptionListener]: JMS exception listener retrieved
> exception!
> {noformat}
> The monitor process now shows 4 connections (4cc50064 is the new connection
> of the client). Also note that there are still consumers (and their sessions)
> which report the old connection ID. But these sessions are actually used by
> the client (he has successfully transferred the sessions). We conclude this
> from the fact that there are NO consumers/producers with the new connection
> ID, but the client is able to communicate fully with the broker after
> successful failover. Also, the session IDs do not change (which indicates a
> session transfer):
> {noformat}
> [DEBUG 2021-03-09 10:37:15,900 cheduler-9
> fms.jmx.ClearDeadBrokerSessionsScheduler]: Found connectionIDs: [4cc50064,
> b20b08e0, 8080f955, 5dc2dff9]
> [DEBUG 2021-03-09 10:37:15,901 cheduler-9
> fms.jmx.ClearDeadBrokerSessionsScheduler]: Found
> consumers:...,{"consumerID":0,"connectionID":"b20b08e0","sessionID":"237f52c1-8039-11eb-b5ae-2acf578dd699","queueName":"incubed.queue.robot_1000.robot_command","browseOnly":false,"creationTime":1615226846214,"deliveringCount":0},...
> {noformat}
> ~60 seconds after the failure of the old connection, the broker closes the
> connection. Afterwards, the monitor process only shows 3 connection (b20b08e0
> was closed). But you can see that there are still consumers reporting the old
> connection ID (these are consumers of sessions that have actually been
> transferred to the new connection):
> {noformat}
> [DEBUG 2021-03-09 10:37:55,973 heduler-10
> fms.jmx.ClearDeadBrokerSessionsScheduler]: Found connectionIDs: [4cc50064,
> 8080f955, 5dc2dff9]
> [DEBUG 2021-03-09 10:37:55,973 heduler-10
> fms.jmx.ClearDeadBrokerSessionsScheduler]: Found consumers:
> ...,{"consumerID":0,"connectionID":"b20b08e0","sessionID":"237f52c1-8039-11eb-b5ae-2acf578dd699","queueName":"incubed.queue.robot_1000.robot_command","browseOnly":false,"creationTime":1615226846214,"deliveringCount":0},...
> {noformat}
> Afterwards, our monitor process goes on closing the sessions of the consumers
> with the old connection ID, although they were actually used by the new
> connection. This effectively breaks the connection of the artemis client.
> h3. The bug
> After looking through the code, we found a problem during session
> reattachment: In "internaltransferConnection" of
> "ServerSessionPacketHandler", the session is transferred from the old to the
> new connection. It seems mostly correct as the new connection is set both on
> the channel and on the ServerSessionPacketHandler itself. But on the
> "session" member variable (an instance of ServerSessionImpl), the new
> connection is not set, although ServerSessionImpl has a field
> "remotingConnection". In fact, the remotingConnectionField is only set once
> in the constructor, which cannot be correct if the ServerSessionImpl object
> is reused from the old to the new connection.
> Among other usages, the remotingConnection field of ServerSessionImpl is used
> to retrieve the connectionID of the session's connection, which in the end is
> used by the JMX server control (ActiveMQServerControlImpl) in order to build
> a JSON representation of a consumer.
> h3. A simple solution
> The most direct solution would be to set the new connection on the
> remotingConnection field of ServerSessionImpl. I will soon post a pull
> request here containing this change.
> There are some considerations though:
> - Do we need to care about locking when setting the field?
> - The "remotingConnection" field is currently final in ServerSessionImpl, and
> always was since the initial HornetQ donation, I'm not sure what the
> intention behind it was. But on the other hand, the session reattachment also
> worked the same way, so the bug of keeping the old connection on the
> ServerSessionImpl also existed back then.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)