[ 
https://issues.apache.org/jira/browse/QPIDJMS-365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16397385#comment-16397385
 ] 

Robbie Gemmell commented on QPIDJMS-365:
----------------------------------------

I was able to reproduce this in a test and look into why it occurs, its just 
the interim failures that play into it rather than the initial one. I noticed a 
potential ultra-hacky change, but it'll need some thought on how to address 
properly.

> [Failover] failover.reconnectDelay not applied between connection attempts if 
> peer Closes gracefully
> ----------------------------------------------------------------------------------------------------
>
>                 Key: QPIDJMS-365
>                 URL: https://issues.apache.org/jira/browse/QPIDJMS-365
>             Project: Qpid JMS
>          Issue Type: Bug
>          Components: qpid-jms-client
>            Reporter: Keith Wall
>            Priority: Major
>
> When using Broker-J's High Availability feature, the client's failover 
> abilities are used to allow the client to discover which Broker in the HA 
> group has the master role.  The client tries each Broker on the failover list 
> until until one successfully responds to the {{Open}} indicating that it is 
> current master.
>  
> When a node is not in the master role, it gracefully closes the AMQP 
> connection by sending a {{Close}} performative.  During election periods, it 
> is normal for all nodes in the HA group to respond with the {{Close}} until 
> the election concludes.
> {noformat}
> Close{error=Error{condition=amqp:not-found, description='Virtual host 
> 'localhost' is not active', info=null}}
> {noformat} 
> The QpidJMS Client failover options includes a {{failover.reconnectDelay}} 
> which "Controls the delay between successive reconnection attempts".   
> However it appears that the reconnection delay is only applied between 
> attempts when a connection fails owing to a 'transport' level failure 
> (connection refused etc).  If the connection fails at the AMQP level, the 
> delay is not applied.
> This is impactful to the HA use-case for Broker-J.   It means that during 
> periods of reelection, the client, tightly spins in the reconnection loop, 
> excessively consuming system resources.  It is also necessary to ensure that 
> {{failover.maxReconnectAttempts}} is set sufficiently large to allow for an 
> election period to conclude successfully.  Whilst the user could use 
> unlimited reconnection attempts, this is unpleasant as it means the system 
> won't fail in the case where the election does not conclude within a 
> reasonable time period.
> Extract of TRACE level logging from 
> {{org.apache.qpid.jms.provider.failover.FailoverProvider}} for the case when 
> a AMQP connection is closed gracefully ({{Close}} performative):
> {noformat}
> 2018-03-13 11:04:54,951 [lization thread] - DEBUG FailoverProvider            
>    - Failover: the provider reports failure: Connection closed by external 
> action [condition = amqp:connection:forced]
> 2018-03-13 11:04:54,951 [lization thread] - DEBUG FailoverProvider            
>    - handling Provider failure: Connection closed by external action 
> [condition = amqp:connection:forced]
> 2018-03-13 11:04:54,951 [lization thread] - TRACE FailoverProvider            
>    - stack
> java.io.IOException: Connection closed by external action [condition = 
> amqp:connection:forced]
>       at 
> org.apache.qpid.jms.util.IOExceptionSupport.create(IOExceptionSupport.java:45)
>       at 
> org.apache.qpid.jms.provider.amqp.AmqpProvider.fireProviderException(AmqpProvider.java:1086)
>       at 
> org.apache.qpid.jms.provider.amqp.AmqpAbstractResource.closeResource(AmqpAbstractResource.java:182)
>       at 
> org.apache.qpid.jms.provider.amqp.AmqpAbstractResource.processRemoteClose(AmqpAbstractResource.java:262)
>       at 
> org.apache.qpid.jms.provider.amqp.AmqpProvider.processUpdates(AmqpProvider.java:949)
>       at 
> org.apache.qpid.jms.provider.amqp.AmqpProvider.access$1900(AmqpProvider.java:104)
>       at 
> org.apache.qpid.jms.provider.amqp.AmqpProvider$17.run(AmqpProvider.java:831)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
> Caused by: javax.jms.JMSException: Connection closed by external action 
> [condition = amqp:connection:forced]
>       at 
> org.apache.qpid.jms.provider.amqp.AmqpSupport.convertToException(AmqpSupport.java:164)
>       at 
> org.apache.qpid.jms.provider.amqp.AmqpSupport.convertToException(AmqpSupport.java:117)
>       ... 11 more
> 2018-03-13 11:04:54,954 [ connect thread] - DEBUG FailoverProvider            
>    - Connection attempt:[1] to: amqp://localhost:5672 in-progress
> 2018-03-13 11:04:55,003 [lization thread] - DEBUG FailoverProvider            
>    - Signalling connection recovery: AmqpProvider: localhost:5672
> 2018-03-13 11:04:55,007 [lization thread] - DEBUG FailoverProvider            
>    - handling Provider failure: Virtual host 'localhost' is not active 
> [condition = amqp:not-found]
> {noformat}
> Extract of TRACE level logging from 
> {{org.apache.qpid.jms.provider.failover.FailoverProvider}} for the case when 
> a AMQP connection is closed at the transport level (Connection Refused):
> {noformat}
> 2018-03-13 11:03:47,069 [lization thread] - TRACE FailoverProvider            
>    - stack
> java.io.IOException: Transport connection remotely closed.
>       at 
> org.apache.qpid.jms.provider.amqp.AmqpProvider$20.run(AmqpProvider.java:901)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
> 2018-03-13 11:03:47,089 [ connect thread] - DEBUG FailoverProvider            
>    - Connection attempt:[1] to: amqp://localhost:5672 in-progress
> 2018-03-13 11:03:47,095 [ connect thread] - INFO  FailoverProvider            
>    - Connection attempt:[1] to: amqp://localhost:5672 failed
> 2018-03-13 11:03:47,095 [ connect thread] - TRACE FailoverProvider            
>    - Next reconnect attempt will be in 10000 milliseconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

Reply via email to