[
https://issues.apache.org/jira/browse/QPIDJMS-534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17327132#comment-17327132
]
Ravi Nirmal commented on QPIDJMS-534:
-------------------------------------
Hi team, can someone please update on this ticket?
cc - [~robbie]
> BalancedProviderFuture.sync stuck forever during connection recovery
> --------------------------------------------------------------------
>
> Key: QPIDJMS-534
> URL: https://issues.apache.org/jira/browse/QPIDJMS-534
> Project: Qpid JMS
> Issue Type: Bug
> Components: qpid-jms-client
> Affects Versions: 0.42.0
> Reporter: Ravi Nirmal
> Priority: Major
> Attachments: logs.txt, thread-dump.txt
>
>
> Recently, we observed an issue on our production environment where we can see
> that BalancedProviderFuture.sync method during connection recovery is stuck
> forever and never returns. We have observed this in 2 hosts in last one week,
> the only solution is to restart the server.
> I am attaching the thread dump which indicates the issue and how it blocks
> other threads, [^thread-dump.txt] will have details of all the threads.
> h3. Details of Investigation
> * This issue is happening on connection recovery during failover from one
> server to another.
> * By debugging I can see that BalancedProviderFuture.sync method is waiting
> for its state to be updated, and its state is updated by AmqpProvider thread.
> In thread dump I don't see any AmqpProvider thread which is in stuck state
> which indicates that AmqpProvider has done its job but still the state for
> given BalancedProviderFuture object is not updated.
> * In the successful event, I can see that the state of
> BalancedProviderFuture object is updated in below sequence:
> ** JmsSession.onConnectionRecovery method calls provider.create after
> creating BalancedProviderFuture object.
> ** provider.create (aka AmqpProvider.create) is start a thread using
> serializer, this create method has proper handling and it either calls
> pumpToProtonTransport OR request.onFailure(which will update the state of
> BalancedProviderFuture in case of exception).
> ** Once the above thread gets finished(basically after
> pumpToProtonTransport), the serializer will call the AmqpProvider.onData
> method which will update the state of BalancedProviderFuture object.
> * I have observed that if we get the exception in AmqpProvider.onData method
> then the state of BalancedProviderFuture is not getting updated and the
> BalancedProviderFuture.sync method gets stuck forever, the exception can come
> in case of protonTransport tail is closed already(probably because of idle
> timeout issue OR any other transport related issue).
> * I have also observed that in some cases(of idle timeout OR transport
> errors) after completion of a thread which was started by provider.create
> (aka AmqpProvider.create), the serializer is not calling AmqpProvider.onData
> but instead it calls AmqpProvider.onTransportError OR
> AmqpProvider.onTransportClosed and I can not see any handling of updating the
> state of BalancedProviderFuture object in onTransportError OR
> onTransportClosed method.
> * I am attaching some [^logs.txt] which shows some errors, these error came
> when the state of BalancedProviderFuture is not updated and sync mehod stuck
> forever.
> * Please note we are using URL - failover:(amqp://localhost:5672
> ,amqp://localhost:5682)?jms.sendTimeout=5000 and qpid version 0.42.0.
> I have found two old tickets QPIDJMS-458 & QPIDJMS-464 which shows the
> similar issue, but I believe this issue is different and might needs to be
> fixed separately.
> Can someone please take a look at this as this becomes critical issue in our
> production environment and we don't have any option except restart of our
> services?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]