[ 
https://issues.apache.org/jira/browse/DISPATCH-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kim van der Riet updated DISPATCH-1110:
---------------------------------------
    Description: 
When running the Qpid Interop Test's AMQP large content test, a stand-alone 
router will intermittently hang and cause the test to time out.

The failure appears to be limited to either the AMQP list or map types, and 
usually with the C++ client as the message sender.  The C++, Python2 and 
Python3 as receiver clients have all seen this failure, but the Python2 
receiver client seems to reproduce more readily on my hardware.

In all cases, the test fails when the router sends what I suppose is the final 
transfer of a large message (I have not added up/counted the bytes of the many 
preceding transfers) to the consumer. The consumer then sends a disposition, 
but the router does not respond again until the test times out. The consumer 
can be seen to send heartbeats to the router, but the router does not send any 
of its own.
{noformat}
... (plenty of 65550-sized frames R->C)
R->C 5976       3.454766        ::1     ::1     AMQP    65550
R->C 5977       3.454775        ::1     ::1     AMQP    65550
R->C 5978       3.454783        ::1     ::1     AMQP    48171
C->R 5982       3.529881        ::1     ::1     AMQP    115     disposition
C->R 5984       7.530704        ::1     ::1     AMQP    94      (empty)
C->R 5986       11.532306       ::1     ::1     AMQP    94      (empty)
...{noformat}
There are no errors to be seen in the router logs other than when the consuming 
client is killed owing to the test timeout.
{noformat}
...
2018-08-29 12:50:23.191754 -0400 SERVER (info) [14]: Accepted connection to 
::1:amqp from ::1:37262
2018-08-29 12:51:19.562695 -0400 SERVER (info) [14]: Connection from ::1:37262 
(to ::1:amqp) failed: amqp:connection:framing-error connection aborted
{noformat}
The reproducer is not very tight on this, and the error occurs about 50% of the 
time on my hardware.

  was:
When running the Qpid Interop Test's AMQP large content test, a stand-alone 
router will intermittently hang and cause the test to time out.

The failure appears to be limited to either the AMQP list or map types, and 
usually with the message producer being the C++ client.  Both C++, Python2 and 
Python3 consumer clients have all seen this failure, but the Python2 client 
seems to reproduce more readily on my hardware.

In all cases, the test fails when the router sends what I suppose is the final 
transfer of a large message (I have not added up/counted the bytes of the many 
preceding transfers) to the consumer. The consumer then sends a disposition, 
but the router does not respond again until the test times out. The consumer 
can be seen to send heartbeats to the router, but the router does not send any 
of its own.
{noformat}
... (plenty of 65550-sized frames R->C)
R->C 5976       3.454766        ::1     ::1     AMQP    65550
R->C 5977       3.454775        ::1     ::1     AMQP    65550
R->C 5978       3.454783        ::1     ::1     AMQP    48171
C->R 5982       3.529881        ::1     ::1     AMQP    115     disposition
C->R 5984       7.530704        ::1     ::1     AMQP    94      (empty)
C->R 5986       11.532306       ::1     ::1     AMQP    94      (empty)
...{noformat}
There are no errors to be seen in the router logs other than when the consuming 
client is killed owing to the test timeout.
{noformat}
...
2018-08-29 12:50:23.191754 -0400 SERVER (info) [14]: Accepted connection to 
::1:amqp from ::1:37262
2018-08-29 12:51:19.562695 -0400 SERVER (info) [14]: Connection from ::1:37262 
(to ::1:amqp) failed: amqp:connection:framing-error connection aborted
{noformat}
The reproducer is not very tight on this, and the error occurs about 50% of the 
time on my hardware.


> Intermittent router hang while running QIT's AMQP large content test
> --------------------------------------------------------------------
>
>                 Key: DISPATCH-1110
>                 URL: https://issues.apache.org/jira/browse/DISPATCH-1110
>             Project: Qpid Dispatch
>          Issue Type: Bug
>         Environment: Standard QIT environment.
> Once QIT is built and installed, the environment is set using the config.sh 
> file. See QUICKSTART for details.
>            Reporter: Kim van der Riet
>            Priority: Major
>         Attachments: qdrouterd.conf
>
>
> When running the Qpid Interop Test's AMQP large content test, a stand-alone 
> router will intermittently hang and cause the test to time out.
> The failure appears to be limited to either the AMQP list or map types, and 
> usually with the C++ client as the message sender.  The C++, Python2 and 
> Python3 as receiver clients have all seen this failure, but the Python2 
> receiver client seems to reproduce more readily on my hardware.
> In all cases, the test fails when the router sends what I suppose is the 
> final transfer of a large message (I have not added up/counted the bytes of 
> the many preceding transfers) to the consumer. The consumer then sends a 
> disposition, but the router does not respond again until the test times out. 
> The consumer can be seen to send heartbeats to the router, but the router 
> does not send any of its own.
> {noformat}
> ... (plenty of 65550-sized frames R->C)
> R->C 5976     3.454766        ::1     ::1     AMQP    65550
> R->C 5977     3.454775        ::1     ::1     AMQP    65550
> R->C 5978     3.454783        ::1     ::1     AMQP    48171
> C->R 5982     3.529881        ::1     ::1     AMQP    115     disposition
> C->R 5984     7.530704        ::1     ::1     AMQP    94      (empty)
> C->R 5986     11.532306       ::1     ::1     AMQP    94      (empty)
> ...{noformat}
> There are no errors to be seen in the router logs other than when the 
> consuming client is killed owing to the test timeout.
> {noformat}
> ...
> 2018-08-29 12:50:23.191754 -0400 SERVER (info) [14]: Accepted connection to 
> ::1:amqp from ::1:37262
> 2018-08-29 12:51:19.562695 -0400 SERVER (info) [14]: Connection from 
> ::1:37262 (to ::1:amqp) failed: amqp:connection:framing-error connection 
> aborted
> {noformat}
> The reproducer is not very tight on this, and the error occurs about 50% of 
> the time on my hardware.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to