[ https://issues.apache.org/jira/browse/QPIDJMS-365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16401399#comment-16401399 ]
ASF subversion and git services commented on QPIDJMS-365: --------------------------------------------------------- Commit bb7b596bbed768b3e3cf03a02a8951b7fdd6d686 in qpid-jms's branch refs/heads/master from [~tabish121] [ https://git-wip-us.apache.org/repos/asf?p=qpid-jms.git;h=bb7b596 ] QPIDJMS-365 Ensure failed provider is cleared if close fails > [Failover] failover.reconnectDelay not applied between connection attempts if > peer Closes gracefully > ---------------------------------------------------------------------------------------------------- > > Key: QPIDJMS-365 > URL: https://issues.apache.org/jira/browse/QPIDJMS-365 > Project: Qpid JMS > Issue Type: Bug > Components: qpid-jms-client > Reporter: Keith Wall > Assignee: Timothy Bish > Priority: Major > > When using Broker-J's High Availability feature, the client's failover > abilities are used to allow the client to discover which Broker in the HA > group has the master role. The client tries each Broker on the failover list > until until one successfully responds to the {{Open}} indicating that it is > current master. > > When a node is not in the master role, it gracefully closes the AMQP > connection by sending a {{Close}} performative. During election periods, it > is normal for all nodes in the HA group to respond with the {{Close}} until > the election concludes. > {noformat} > Close{error=Error{condition=amqp:not-found, description='Virtual host > 'localhost' is not active', info=null}} > {noformat} > The QpidJMS Client failover options includes a {{failover.reconnectDelay}} > which "Controls the delay between successive reconnection attempts". > However it appears that the reconnection delay is only applied between > attempts when a connection fails owing to a 'transport' level failure > (connection refused etc). If the connection fails at the AMQP level, the > delay is not applied. > This is impactful to the HA use-case for Broker-J. It means that during > periods of reelection, the client, tightly spins in the reconnection loop, > excessively consuming system resources. It is also necessary to ensure that > {{failover.maxReconnectAttempts}} is set sufficiently large to allow for an > election period to conclude successfully. Whilst the user could use > unlimited reconnection attempts, this is unpleasant as it means the system > won't fail in the case where the election does not conclude within a > reasonable time period. > Extract of TRACE level logging from > {{org.apache.qpid.jms.provider.failover.FailoverProvider}} for the case when > a AMQP connection is closed gracefully ({{Close}} performative): > {noformat} > 2018-03-13 11:04:54,951 [lization thread] - DEBUG FailoverProvider > - Failover: the provider reports failure: Connection closed by external > action [condition = amqp:connection:forced] > 2018-03-13 11:04:54,951 [lization thread] - DEBUG FailoverProvider > - handling Provider failure: Connection closed by external action > [condition = amqp:connection:forced] > 2018-03-13 11:04:54,951 [lization thread] - TRACE FailoverProvider > - stack > java.io.IOException: Connection closed by external action [condition = > amqp:connection:forced] > at > org.apache.qpid.jms.util.IOExceptionSupport.create(IOExceptionSupport.java:45) > at > org.apache.qpid.jms.provider.amqp.AmqpProvider.fireProviderException(AmqpProvider.java:1086) > at > org.apache.qpid.jms.provider.amqp.AmqpAbstractResource.closeResource(AmqpAbstractResource.java:182) > at > org.apache.qpid.jms.provider.amqp.AmqpAbstractResource.processRemoteClose(AmqpAbstractResource.java:262) > at > org.apache.qpid.jms.provider.amqp.AmqpProvider.processUpdates(AmqpProvider.java:949) > at > org.apache.qpid.jms.provider.amqp.AmqpProvider.access$1900(AmqpProvider.java:104) > at > org.apache.qpid.jms.provider.amqp.AmqpProvider$17.run(AmqpProvider.java:831) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: javax.jms.JMSException: Connection closed by external action > [condition = amqp:connection:forced] > at > org.apache.qpid.jms.provider.amqp.AmqpSupport.convertToException(AmqpSupport.java:164) > at > org.apache.qpid.jms.provider.amqp.AmqpSupport.convertToException(AmqpSupport.java:117) > ... 11 more > 2018-03-13 11:04:54,954 [ connect thread] - DEBUG FailoverProvider > - Connection attempt:[1] to: amqp://localhost:5672 in-progress > 2018-03-13 11:04:55,003 [lization thread] - DEBUG FailoverProvider > - Signalling connection recovery: AmqpProvider: localhost:5672 > 2018-03-13 11:04:55,007 [lization thread] - DEBUG FailoverProvider > - handling Provider failure: Virtual host 'localhost' is not active > [condition = amqp:not-found] > {noformat} > Extract of TRACE level logging from > {{org.apache.qpid.jms.provider.failover.FailoverProvider}} for the case when > a AMQP connection is closed at the transport level (Connection Refused): > {noformat} > 2018-03-13 11:03:47,069 [lization thread] - TRACE FailoverProvider > - stack > java.io.IOException: Transport connection remotely closed. > at > org.apache.qpid.jms.provider.amqp.AmqpProvider$20.run(AmqpProvider.java:901) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-03-13 11:03:47,089 [ connect thread] - DEBUG FailoverProvider > - Connection attempt:[1] to: amqp://localhost:5672 in-progress > 2018-03-13 11:03:47,095 [ connect thread] - INFO FailoverProvider > - Connection attempt:[1] to: amqp://localhost:5672 failed > 2018-03-13 11:03:47,095 [ connect thread] - TRACE FailoverProvider > - Next reconnect attempt will be in 10000 milliseconds > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org