[ 
https://issues.apache.org/jira/browse/DISPATCH-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16642461#comment-16642461
 ] 

Chuck Rolke commented on DISPATCH-1110:
---------------------------------------

The title for this issue, "Intermittent router hang while running QIT's AMQP 
large content test", is a little misleading. The router never hangs. The hang 
is in the test Receiver which is waiting for 8 messages and only 7 messages 
arrive before the test times out.

As designed, the test Sender sends 8 unacknowledged messages into the router 
network and then disconnects the AMQP connection after 2, 3, 4, or 5 messages 
are confirmed through the on_tracker_accept callback. What is the expectation 
for the remainder of the unconfirmed messages?

I added instrumentation to the QIT Sender and Receiver to get timestamps and 
more internal information about test progress. In particular I added an 
'on_tracker_release' callback. When this callback is invoked it means that for 
some reason the router could not send a message to its destination. During one 
of the Receiver hang events the Sender is being notified of a message being 
released:

{{stderr=
 1539028405.939654 on_sendable: sent 8 messages
 1539028405.969454 on_sendable: doing nothing. Already sent 8 messages
 1539028405.982049 on_sendable: doing nothing. Already sent 8 messages
 1539028405.982106 on_tracker_release: msgsConfirmed 0
 amqp_large_content_test::Sender::on_connection_error: 
amqp:session:invalid-field: sequencing error, expected delivery-id 5, got 4
 1539028405.982304 on_container_stop: msgsConfirmed 0
 amqp_large_content_test: Sender error: on_connection_error
 }}

When the Sender is notified of a tracker release then the Receiver is 
guaranteed not to receive all the messages.

> Intermittent router hang while running QIT's AMQP large content test
> --------------------------------------------------------------------
>
>                 Key: DISPATCH-1110
>                 URL: https://issues.apache.org/jira/browse/DISPATCH-1110
>             Project: Qpid Dispatch
>          Issue Type: Bug
>         Environment: Standard QIT environment.
> Once QIT is built and installed, the environment is set using the config.sh 
> file. See QUICKSTART for details.
>            Reporter: Kim van der Riet
>            Assignee: Ganesh Murthy
>            Priority: Major
>         Attachments: qdrouterd.conf
>
>
> When running the Qpid Interop Test's AMQP large content test, a stand-alone 
> router will intermittently hang and cause the test to time out.
> The failure appears to be limited to either the AMQP list or map types, and 
> usually with the C++ client as the message sender.  The C++, Python2 and 
> Python3 as receiver clients have all seen this failure, but the Python2 
> receiver client seems to reproduce more readily on my hardware.
> In all cases, the test fails when the router sends what I suppose is the 
> final transfer of a large message (I have not added up/counted the bytes of 
> the many preceding transfers) to the consumer. The consumer then sends a 
> disposition, but the router does not respond again until the test times out. 
> The consumer can be seen to send heartbeats to the router, but the router 
> does not send any of its own.
> {noformat}
> ... (plenty of 65550-sized frames R->C)
> R->C 5976     3.454766        ::1     ::1     AMQP    65550
> R->C 5977     3.454775        ::1     ::1     AMQP    65550
> R->C 5978     3.454783        ::1     ::1     AMQP    48171
> C->R 5982     3.529881        ::1     ::1     AMQP    115     disposition
> C->R 5984     7.530704        ::1     ::1     AMQP    94      (empty)
> C->R 5986     11.532306       ::1     ::1     AMQP    94      (empty)
> ...{noformat}
> There are no errors to be seen in the router logs other than when the 
> consuming client is killed owing to the test timeout.
> {noformat}
> ...
> 2018-08-29 12:50:23.191754 -0400 SERVER (info) [14]: Accepted connection to 
> ::1:amqp from ::1:37262
> 2018-08-29 12:51:19.562695 -0400 SERVER (info) [14]: Connection from 
> ::1:37262 (to ::1:amqp) failed: amqp:connection:framing-error connection 
> aborted
> {noformat}
> The reproducer is not very tight on this, and the error occurs about 50% of 
> the time on my hardware.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

Reply via email to