[
https://issues.apache.org/jira/browse/DISPATCH-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kim van der Riet updated DISPATCH-1110:
---------------------------------------
Description:
When running the Qpid Interop Test's AMQP large content test, a stand-alone
router will intermittently hang and cause the test to time out.
The failure appears to be limited to either the AMQP list or map types, and
usually with the C++ client as the message sender. The C++, Python2 and
Python3 as receiver clients have all seen this failure, but the Python2
receiver client seems to reproduce more readily on my hardware.
In all cases, the test fails when the router sends what I suppose is the final
transfer of a large message (I have not added up/counted the bytes of the many
preceding transfers) to the consumer. The consumer then sends a disposition,
but the router does not respond again until the test times out. The consumer
can be seen to send heartbeats to the router, but the router does not send any
of its own.
{noformat}
... (plenty of 65550-sized frames R->C)
R->C 5976 3.454766 ::1 ::1 AMQP 65550
R->C 5977 3.454775 ::1 ::1 AMQP 65550
R->C 5978 3.454783 ::1 ::1 AMQP 48171
C->R 5982 3.529881 ::1 ::1 AMQP 115 disposition
C->R 5984 7.530704 ::1 ::1 AMQP 94 (empty)
C->R 5986 11.532306 ::1 ::1 AMQP 94 (empty)
...{noformat}
There are no errors to be seen in the router logs other than when the consuming
client is killed owing to the test timeout.
{noformat}
...
2018-08-29 12:50:23.191754 -0400 SERVER (info) [14]: Accepted connection to
::1:amqp from ::1:37262
2018-08-29 12:51:19.562695 -0400 SERVER (info) [14]: Connection from ::1:37262
(to ::1:amqp) failed: amqp:connection:framing-error connection aborted
{noformat}
The reproducer is not very tight on this, and the error occurs about 50% of the
time on my hardware.
was:
When running the Qpid Interop Test's AMQP large content test, a stand-alone
router will intermittently hang and cause the test to time out.
The failure appears to be limited to either the AMQP list or map types, and
usually with the message producer being the C++ client. Both C++, Python2 and
Python3 consumer clients have all seen this failure, but the Python2 client
seems to reproduce more readily on my hardware.
In all cases, the test fails when the router sends what I suppose is the final
transfer of a large message (I have not added up/counted the bytes of the many
preceding transfers) to the consumer. The consumer then sends a disposition,
but the router does not respond again until the test times out. The consumer
can be seen to send heartbeats to the router, but the router does not send any
of its own.
{noformat}
... (plenty of 65550-sized frames R->C)
R->C 5976 3.454766 ::1 ::1 AMQP 65550
R->C 5977 3.454775 ::1 ::1 AMQP 65550
R->C 5978 3.454783 ::1 ::1 AMQP 48171
C->R 5982 3.529881 ::1 ::1 AMQP 115 disposition
C->R 5984 7.530704 ::1 ::1 AMQP 94 (empty)
C->R 5986 11.532306 ::1 ::1 AMQP 94 (empty)
...{noformat}
There are no errors to be seen in the router logs other than when the consuming
client is killed owing to the test timeout.
{noformat}
...
2018-08-29 12:50:23.191754 -0400 SERVER (info) [14]: Accepted connection to
::1:amqp from ::1:37262
2018-08-29 12:51:19.562695 -0400 SERVER (info) [14]: Connection from ::1:37262
(to ::1:amqp) failed: amqp:connection:framing-error connection aborted
{noformat}
The reproducer is not very tight on this, and the error occurs about 50% of the
time on my hardware.
> Intermittent router hang while running QIT's AMQP large content test
> --------------------------------------------------------------------
>
> Key: DISPATCH-1110
> URL: https://issues.apache.org/jira/browse/DISPATCH-1110
> Project: Qpid Dispatch
> Issue Type: Bug
> Environment: Standard QIT environment.
> Once QIT is built and installed, the environment is set using the config.sh
> file. See QUICKSTART for details.
> Reporter: Kim van der Riet
> Priority: Major
> Attachments: qdrouterd.conf
>
>
> When running the Qpid Interop Test's AMQP large content test, a stand-alone
> router will intermittently hang and cause the test to time out.
> The failure appears to be limited to either the AMQP list or map types, and
> usually with the C++ client as the message sender. The C++, Python2 and
> Python3 as receiver clients have all seen this failure, but the Python2
> receiver client seems to reproduce more readily on my hardware.
> In all cases, the test fails when the router sends what I suppose is the
> final transfer of a large message (I have not added up/counted the bytes of
> the many preceding transfers) to the consumer. The consumer then sends a
> disposition, but the router does not respond again until the test times out.
> The consumer can be seen to send heartbeats to the router, but the router
> does not send any of its own.
> {noformat}
> ... (plenty of 65550-sized frames R->C)
> R->C 5976 3.454766 ::1 ::1 AMQP 65550
> R->C 5977 3.454775 ::1 ::1 AMQP 65550
> R->C 5978 3.454783 ::1 ::1 AMQP 48171
> C->R 5982 3.529881 ::1 ::1 AMQP 115 disposition
> C->R 5984 7.530704 ::1 ::1 AMQP 94 (empty)
> C->R 5986 11.532306 ::1 ::1 AMQP 94 (empty)
> ...{noformat}
> There are no errors to be seen in the router logs other than when the
> consuming client is killed owing to the test timeout.
> {noformat}
> ...
> 2018-08-29 12:50:23.191754 -0400 SERVER (info) [14]: Accepted connection to
> ::1:amqp from ::1:37262
> 2018-08-29 12:51:19.562695 -0400 SERVER (info) [14]: Connection from
> ::1:37262 (to ::1:amqp) failed: amqp:connection:framing-error connection
> aborted
> {noformat}
> The reproducer is not very tight on this, and the error occurs about 50% of
> the time on my hardware.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]