[
https://issues.apache.org/jira/browse/QPIDJMS-563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489615#comment-17489615
]
Robbie Gemmell commented on QPIDJMS-563:
----------------------------------------
Yes, the reconnect delay gets reset to the initial value after a connection is
successfully reestablished (which to be clear in this case it actually is),
otherwise the next failover would wait longer than the previous one every time.
Unfortunately in this case the server then just barfs and kills the connection
again when it is used to send again. However that reset probably isnt the issue
here, but rather that I forgot the first connect attempt is always immediate
upon the drop, its only the next one that picks up the intial delay and starts
to backoff. In this case, that first reconnect will always succeed, skuppering
things.
I tried my suggestion and it worked as I expected it to, the send timed out and
threw as expected. The main difference in what I tried seems to be that I used
a lower timeout than you used, which resulted in different behaviour due to
that immediate-retry I forgot about, and also the fact it seems the server
takes a couple of seconds to actually kill the connection which also influences
things. I think that if the configured timeout is larger than that server
delay, then it kills the connection and the initial reconnect attempt is made
and succeeds, which then just equates things back to exactly the same situation
as was occurring originally, which is happening as I had outlined it may be
(i.e the server isnt rejecting the message at all, it is simply killing the
entire connection after the message arrives, then a successfull reconnect
occurs, then the process repeats itself as the server kills the connection on
send, rinse and repeat). If the send timeout is below that delay in the server
dropping the connection, then it reacted as I expected it would, timeing it out
and resetting things until the next send call is made and provokes the issue
again.
I dont think your proposal is appropriate as it breaks the intended behaviour
of that segment, since it then wont cancel that task as expected. Essentially
it 'fixes' things for you by leaving a task running forever that is only
intended to exist during disconnection, it just happens that this works around
the crappy server behaviour due to later running the task at a point it wasnt
really intended.
Its possible some other change in this space could better handle that and allow
the timeout to more consistently workaround the server behaviour though.
> QPID JMS - Failover - endless loop on sending a message with reject outcome
> ---------------------------------------------------------------------------
>
> Key: QPIDJMS-563
> URL: https://issues.apache.org/jira/browse/QPIDJMS-563
> Project: Qpid JMS
> Issue Type: Bug
> Components: qpid-jms-client
> Affects Versions: 1.5.0
> Reporter: Thomas Stollenwerk
> Priority: Critical
> Attachments: rabbitmq-client-with-failover.zip,
> rabbitmq-queue-config.png
>
>
> +*Summary:*+
> Having a amqp v1 message which is rejected by the amqp v1 broker results in
> an endless failover loop blocking the JmsMessageProducer.send method.
> +*Expected:*+
> The failover of the qpid jms client tries to connect and send the rejected
> message in a maximum of failover.maxReconnectAttempts.
> +*Actual:*+
> The result of the message send task is not taken into account for counting
> the attempts.
> The JmsMessageProducer.send gets stuck forever.
> # Failover connects to the failover uri and then resets the attempt counter.
> # Sending the message results in REJECT outcome.
> # Failover connects to the next failover uri and then resets the attempt
> counter.
> # Sending the message results in REJECT outcome.
> # ... repeats and does not stop on maxReconnectAttempts.
> +*Code analysis:*+
> *FailoverProvider.java#L1282* already resets the connection attempts with
> *reconnectControl.connectionEstablished()* not waiting for the success of the
> send task, resulting in an endless failover loop.
> +*Background:*+
> In our case we are using RabbitMQ with the amqp v1 plugin. There are
> scenarios where the broker is answering with REJECT outcomes (overflow
> policy, high watermark etc). Right now we cannot use this feature because the
> client is stuck and the RabbitMQ is flooded with reconnects.
> We would be very grateful having a fix.
> Thanks for the good work and best regards
> Thomas
>
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]