Markus Meierhofer created ARTEMIS-3174:
------------------------------------------

             Summary: ServerSessionImpl doesn't set new connection when 
transferring session
                 Key: ARTEMIS-3174
                 URL: https://issues.apache.org/jira/browse/ARTEMIS-3174
             Project: ActiveMQ Artemis
          Issue Type: Bug
          Components: Broker
    Affects Versions: 2.17.0, 2.16.0
         Environment: We are using CORE connections between the broker and our 
clients.

Our clients are using the JMS abstraction layer.

The clients are using the following options:
 * confirmationWindowSize of 1024*1024 (=1MiB)
 * connectionTTL of 60 seconds
 * reconnectAttempts and initialConnectAttempts set to -1 (infinite retry)

 

We found the issue with Artemis version 2.16.0, but I checked latest master and 
the bug still exists there.
            Reporter: Markus Meierhofer


h3. How the issue was discovered

Because of a different bug in the broker where sometimes, closing a connection 
didn't clean up the connected sessions 
(https://issues.apache.org/jira/browse/ARTEMIS-2870), we implemented a 
"monitor" process running alongside the broker which, via JMX remote 
management, checks the existing connection IDs and compares them with the 
connection IDs it retrieved from existing consumers/producers. When a 
connectionID of a consumer/producer isn't contained in the list of active 
connection IDs, the monitor process closed the session attached to the 
consumer/producer.

At a customer site, we had the issue that our monitor process seemingly closed 
sessions of alive connections and not closed ones. This was due to the fact 
that, after a connection failover by the client, the consumer/producer JSON 
object retrieved via JMX still reported to be using the old connection ID 
although the consumer/producer was already using the new connection (when 
checking the logs of the artemis client after failover).

The scenario is described down below. It shows the logs of the artemis client 
on the one hand and of our "monitor" process connected to the broker via JMX on 
the other hand, which every 10 seconds prints all connection IDs and the 
consumer/producerInfo retrieved.

At 10:36:45, the artemis client recognized a connection failure:
{noformat}
[WARN 2021-03-09 10:36:51,522 l-threads)  
org.apache.activemq.artemis.core.client]: AMQ212037: Connection failure to 
fms/10.221.5.49:61616 has been detected: AMQ219014: Timed out after waiting 
4,000 ms for response when sending packet 71 
[code=CONNECTION_TIMEDOUT]{noformat}
On the monitor process, you can see that there are 3 connection IDs (b20b08e0 
is the old connection of the client which failed):
 {noformat}
[DEBUG 2021-03-09 10:36:45,858 cheduler-1 
fms.jmx.ClearDeadBrokerSessionsScheduler]: Found connectionIDs: [b20b08e0, 
8080f955, 5dc2dff9]
Found consumers: 
...,{"consumerID":0,"connectionID":"b20b08e0","sessionID":"237f52c1-8039-11eb-b5ae-2acf578dd699","queueName":"incubed.queue.robot_1000.robot_command","browseOnly":false,"creationTime":1615226846214,"deliveringCount":0},...
{noformat}

At 10:37:09 (25 seconds later), the client is able to successfully fail over. 
It seems that he is doing session reattachment instead of session recreate:
{noformat}
[ERROR 2021-03-09 10:37:09,307 ead-289224 
nt.control.RestartAgentExceptionListener]: JMS exception listener retrieved 
exception!
{noformat}
The monitor process now shows 4 connections (4cc50064 is the new connection of 
the client). Also note that there are still consumers (and their sessions) 
which report the old connection ID. But these sessions are actually used by the 
client (he has successfully transferred the sessions). We conclude this from 
the fact that there are NO consumers/producers with the new connection ID, but 
the client is able to communicate fully with the broker after successful 
failover. Also, the session IDs do not change (which indicates a session 
transfer):
{noformat}
[DEBUG 2021-03-09 10:37:15,900 cheduler-9 
fms.jmx.ClearDeadBrokerSessionsScheduler]: Found connectionIDs: [4cc50064, 
b20b08e0, 8080f955, 5dc2dff9]
 [DEBUG 2021-03-09 10:37:15,901 cheduler-9 
fms.jmx.ClearDeadBrokerSessionsScheduler]: Found 
consumers:...,{"consumerID":0,"connectionID":"b20b08e0","sessionID":"237f52c1-8039-11eb-b5ae-2acf578dd699","queueName":"incubed.queue.robot_1000.robot_command","browseOnly":false,"creationTime":1615226846214,"deliveringCount":0},...
{noformat}

~60 seconds after the failure of the old connection, the broker closes the 
connection. Afterwards, the monitor process only shows 3 connection (b20b08e0 
was closed). But you can see that there are still consumers reporting the old 
connection ID (these are consumers of sessions that have actually been 
transferred to the new connection):
{noformat}
[DEBUG 2021-03-09 10:37:55,973 heduler-10 
fms.jmx.ClearDeadBrokerSessionsScheduler]: Found connectionIDs: [4cc50064, 
8080f955, 5dc2dff9]
[DEBUG 2021-03-09 10:37:55,973 heduler-10 
fms.jmx.ClearDeadBrokerSessionsScheduler]: Found consumers: 
...,{"consumerID":0,"connectionID":"b20b08e0","sessionID":"237f52c1-8039-11eb-b5ae-2acf578dd699","queueName":"incubed.queue.robot_1000.robot_command","browseOnly":false,"creationTime":1615226846214,"deliveringCount":0},...
{noformat}
Afterwards, our monitor process goes on closing the sessions of the consumers 
with the old connection ID, although they were actually used by the new 
connection. This effectively breaks the connection of the artemis client.

h3. The bug

After looking through the code, we found a problem during session reattachment: 
In "internaltransferConnection" of "ServerSessionPacketHandler", the session is 
transferred from the old to the new connection. It seems mostly correct as the 
new connection is set both on the channel and on the ServerSessionPacketHandler 
itself. But on the "session" member variable (an instance of 
ServerSessionImpl), the new connection is not set, although ServerSessionImpl 
has a field "remotingConnection". In fact, the remotingConnectionField is only 
set once in the constructor, which cannot be correct if the ServerSessionImpl 
object is reused from the old to the new connection.

Among other usages, the remotingConnection field of ServerSessionImpl is used 
to retrieve the connectionID of the session's connection, which in the end is 
used by the JMX server control (ActiveMQServerControlImpl) in order to build a 
JSON representation of a consumer.

h3. A simple solution

The most direct solution would be to set the new connection on the 
remotingConnection field of ServerSessionImpl. I will soon post a pull request 
here containing this change.
There are some considerations though:
- Do we need to care about locking when setting the field?
- The "remotingConnection" field is currently final in ServerSessionImpl, and 
always was since the initial HornetQ donation, I'm not sure what the intention 
behind it was. But on the other hand, the session reattachment also worked the 
same way, so the bug of keeping the old connection on the ServerSessionImpl 
also existed back then.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to