[jira] [Commented] (QPIDJMS-563) QPID JMS - Failover - endless loop on sending a message with reject outcome

Robbie Gemmell (Jira) Wed, 09 Feb 2022 07:03:06 -0800


    [ 
https://issues.apache.org/jira/browse/QPIDJMS-563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489615#comment-17489615
 ]


Robbie Gemmell commented on QPIDJMS-563:
----------------------------------------

Yes, the reconnect delay gets reset to the initial value after a connection is 
successfully reestablished (which to be clear in this case it actually is), 
otherwise the next failover would wait longer than the previous one every time. 
Unfortunately in this case the server then just barfs and kills the connection 
again when it is used to send again. However that reset probably isnt the issue 
here, but rather that I forgot the first connect attempt is always immediate 
upon the drop, its only the next one that picks up the intial delay and starts 
to backoff. In this case, that first reconnect will always succeed, skuppering 
things.

I tried my suggestion and it worked as I expected it to, the send timed out and 
threw as expected. The main difference in what I tried seems to be that I used 
a lower timeout than you used, which resulted in different behaviour due to 
that immediate-retry I forgot about, and also the fact it seems the server 
takes a couple of seconds to actually kill the connection which also influences 
things. I think that if the configured timeout is larger than that server 
delay, then it kills the connection and the initial reconnect attempt is made 
and succeeds, which then just equates things back to exactly the same situation 
as was occurring originally, which is happening as I had outlined it may be 
(i.e the server isnt rejecting the message at all, it is simply killing the 
entire connection after the message arrives, then a successfull reconnect 
occurs, then the process repeats itself as the server kills the connection on 
send, rinse and repeat). If the send timeout is below that delay in the server 
dropping the connection, then it reacted as I expected it would, timeing it out 
and resetting things until the next send call is made and provokes the issue 
again.

I dont think your proposal is appropriate as it breaks the intended behaviour 
of that segment, since it then wont cancel that task as expected. Essentially 
it 'fixes' things for you by leaving a task running forever that is only 
intended to exist during disconnection, it just happens that this works around 
the crappy server behaviour due to later running the task at a point it wasnt 
really intended.

Its possible some other change in this space could better handle that and allow 
the timeout to more consistently workaround the server behaviour though.

> QPID JMS - Failover - endless loop on sending a message with reject outcome
> ---------------------------------------------------------------------------
>
>                 Key: QPIDJMS-563
>                 URL: https://issues.apache.org/jira/browse/QPIDJMS-563
>             Project: Qpid JMS
>          Issue Type: Bug
>          Components: qpid-jms-client
>    Affects Versions: 1.5.0
>            Reporter: Thomas Stollenwerk
>            Priority: Critical
>         Attachments: rabbitmq-client-with-failover.zip, 
> rabbitmq-queue-config.png
>
>
> +*Summary:*+
> Having a amqp v1 message which is rejected by the amqp v1 broker results in 
> an endless failover loop blocking the JmsMessageProducer.send method.
> +*Expected:*+
> The failover of the qpid jms client tries to connect and send the rejected 
> message in a maximum of failover.maxReconnectAttempts.
> +*Actual:*+
> The result of the message send task is not taken into account for counting 
> the attempts.
> The JmsMessageProducer.send gets stuck forever.
>  # Failover connects to the failover uri and then resets the attempt counter.
>  # Sending the message results in REJECT outcome.
>  # Failover connects to the next failover uri and then resets the attempt 
> counter.
>  # Sending the message results in REJECT outcome.
>  # ... repeats and does not stop on maxReconnectAttempts.
> +*Code analysis:*+
> *FailoverProvider.java#L1282* already resets the connection attempts with 
> *reconnectControl.connectionEstablished()* not waiting for the success of the 
> send task, resulting in an endless failover loop.
> +*Background:*+
> In our case we are using RabbitMQ with the amqp v1 plugin. There are 
> scenarios where the broker is answering with REJECT outcomes (overflow 
> policy, high watermark etc). Right now we cannot use this feature because the 
> client is stuck and the RabbitMQ is flooded with reconnects.
> We would be very grateful having a fix. 
> Thanks for the good work and best regards
> Thomas
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (QPIDJMS-563) QPID JMS - Failover - endless loop on sending a message with reject outcome

Reply via email to