Keith Wall created QPIDJMS-365:
----------------------------------

             Summary: [Failover] failover.reconnectDelay not applied between 
connection attempts if peer Closes gracefully
                 Key: QPIDJMS-365
                 URL: https://issues.apache.org/jira/browse/QPIDJMS-365
             Project: Qpid JMS
          Issue Type: Bug
          Components: qpid-jms-client
            Reporter: Keith Wall


When using Broker-J's High Availability feature, the client's failover 
abilities are used to allow the client to discover which Broker in the HA group 
has the master role.  The client tries each Broker on the failover list until 
until one successfully responds to the {{Open}} indicating that it is current 
master.
 
When a node is not in the master role, it gracefully closes the AMQP connection 
by sending a {{Close}} performative.  During election periods, it is normal for 
all nodes in the HA group to respond with the {{Close}} until the election 
concludes.

{noformat}
Close{error=Error{condition=amqp:not-found, description='Virtual host 
'localhost' is not active', info=null}}
{noformat} 

The QpidJMS Client failover options includes a {{failover.reconnectDelay}} 
which "Controls the delay between successive reconnection attempts".   However 
it appears that the reconnection delay is only applied between attempts when a 
connection fails owing to a 'transport' level failure (connection refused etc). 
 If the connection fails at the AMQP level, the delay is not applied.

This is impactful to the HA use-case for Broker-J.   It means that during 
periods of reelection, the client, tightly spins in the reconnection loop, 
excessively consuming system resources.  It is also necessary to ensure that 
{{failover.maxReconnectAttempts}} is set sufficiently large to allow for an 
election period to conclude successfully.  Whilst the user could use unlimited 
reconnection attempts, this is unpleasant as it means the system won't fail in 
the case where the election does not conclude within a reasonable time period.

Extract of TRACE level logging from 
{{org.apache.qpid.jms.provider.failover.FailoverProvider}} for the case when a 
AMQP connection is closed gracefully ({{Close}} performative):

{noformat}
2018-03-13 11:04:54,951 [lization thread] - DEBUG FailoverProvider              
 - Failover: the provider reports failure: Connection closed by external action 
[condition = amqp:connection:forced]
2018-03-13 11:04:54,951 [lization thread] - DEBUG FailoverProvider              
 - handling Provider failure: Connection closed by external action [condition = 
amqp:connection:forced]
2018-03-13 11:04:54,951 [lization thread] - TRACE FailoverProvider              
 - stack
java.io.IOException: Connection closed by external action [condition = 
amqp:connection:forced]
        at 
org.apache.qpid.jms.util.IOExceptionSupport.create(IOExceptionSupport.java:45)
        at 
org.apache.qpid.jms.provider.amqp.AmqpProvider.fireProviderException(AmqpProvider.java:1086)
        at 
org.apache.qpid.jms.provider.amqp.AmqpAbstractResource.closeResource(AmqpAbstractResource.java:182)
        at 
org.apache.qpid.jms.provider.amqp.AmqpAbstractResource.processRemoteClose(AmqpAbstractResource.java:262)
        at 
org.apache.qpid.jms.provider.amqp.AmqpProvider.processUpdates(AmqpProvider.java:949)
        at 
org.apache.qpid.jms.provider.amqp.AmqpProvider.access$1900(AmqpProvider.java:104)
        at 
org.apache.qpid.jms.provider.amqp.AmqpProvider$17.run(AmqpProvider.java:831)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: javax.jms.JMSException: Connection closed by external action 
[condition = amqp:connection:forced]
        at 
org.apache.qpid.jms.provider.amqp.AmqpSupport.convertToException(AmqpSupport.java:164)
        at 
org.apache.qpid.jms.provider.amqp.AmqpSupport.convertToException(AmqpSupport.java:117)
        ... 11 more
2018-03-13 11:04:54,954 [ connect thread] - DEBUG FailoverProvider              
 - Connection attempt:[1] to: amqp://localhost:5672 in-progress
2018-03-13 11:04:55,003 [lization thread] - DEBUG FailoverProvider              
 - Signalling connection recovery: AmqpProvider: localhost:5672
2018-03-13 11:04:55,007 [lization thread] - DEBUG FailoverProvider              
 - handling Provider failure: Virtual host 'localhost' is not active [condition 
= amqp:not-found]
{noformat}

Extract of TRACE level logging from 
{{org.apache.qpid.jms.provider.failover.FailoverProvider}} for the case when a 
AMQP connection is closed at the transport level (Connection Refused):

{noformat}
2018-03-13 11:03:47,069 [lization thread] - TRACE FailoverProvider              
 - stack
java.io.IOException: Transport connection remotely closed.
        at 
org.apache.qpid.jms.provider.amqp.AmqpProvider$20.run(AmqpProvider.java:901)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
2018-03-13 11:03:47,089 [ connect thread] - DEBUG FailoverProvider              
 - Connection attempt:[1] to: amqp://localhost:5672 in-progress
2018-03-13 11:03:47,095 [ connect thread] - INFO  FailoverProvider              
 - Connection attempt:[1] to: amqp://localhost:5672 failed
2018-03-13 11:03:47,095 [ connect thread] - TRACE FailoverProvider              
 - Next reconnect attempt will be in 10000 milliseconds
{noformat}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

Reply via email to