[
https://issues.apache.org/jira/browse/DISPATCH-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287149#comment-17287149
]
Charles E. Rolke commented on DISPATCH-1968:
--------------------------------------------
Ref: file DISPATCH-1968-test_12-crash.html
This file is the log of failed router INTA when running a subset of the tests.
The log was not generated from the PR branch but from
[https://github.com/ChugR/qpid-dispatch/tree/1.15-cr-crash] at commit 4dc1a.
The only difference is a bunch of logs printing the physical address of
connection objects. Logically it is still doing what the PR branch does.
The three tests use the modified echo server that slams the connection shut
when data to be echoed is received. The first two tests do not crash the
router. Test_10 sends 100 bytes and Test_11 sends 1000 bytes. The tests fail
because the echo didn't happen but the router survives. Test_12, which sends
500k bytes, is the interesting one.
At 09:32:47.268976 the echo client starts. The test progresses through the
normal startup and makes a connection [C25] to the server at 09:32:47.305538.
At 09:32:47.309816 the server closes the connection after receiving the first
data packet.
Proton issues a PN_RAW_CONNECTION_WRITTEN event and a
PN_RAW_CONNECTION_NEED_WRITE_BUFFERS before closing the connection for read
and write at 09:32:47.310958.
The TCP adaptor continues to read the input stream on [C24] and write buffers
into the closed [C25].
At 09:32:47.312627, after the last TCP_ADPATOR log line, the router crashed.
GDB puts the failure in proton epoll wake(pcontext_t *ctx) where ctx points to
a structure full of 0x666666666666666. I'm unsure if this is from
MALLOC_PERTURB or if proton is deliberately writing this pattern. Either way it
looks like the ctx pointer is pointing to a block of memory that has been freed.
> Crash after running series of 1Mb iperf3 against TCP adaptor
> ------------------------------------------------------------
>
> Key: DISPATCH-1968
> URL: https://issues.apache.org/jira/browse/DISPATCH-1968
> Project: Qpid Dispatch
> Issue Type: Bug
> Components: Protocol Adaptors
> Affects Versions: 1.15.0
> Environment: Fedora 32 bare metal 64-bit.
> Dispatch at 1.15 release
> Proton git branch master @ 5e7d7af8f
> Reporter: Charles E. Rolke
> Priority: Major
> Attachments: DISPATCH-1968-test_12-crash.html, INTA.conf
>
>
> h2. Setup
> Running with a minimal TCP adaptor listener / connector on a single router.
> See attached INTA.conf. These processes run on a single laptop.
> Start a iperf3 server on default port 5201:
> iperf3 -s
> Run iperf3 client in a loop to port 5202 served by the TCP adaptor.
> iperf3 -c hostname -p 5202 -n 1000000
> h2. Issues
> After a few loops the router crashes with malloc having a corrupted doubly
> linked list.
> Sometimes the test client hangs for a few seconds until the iperf server
> times out.
> Qdstat shows many resource leaks of qd_buffer_t and stream data objects.
> h2. Observations
> h3. Tracing a single iperf3 session
> A wireshark trace of a single iperf3 session shows the client opening two
> connections to the router and the router opening two connections to the
> server. This is expected.
> As the test runs there is a certain amount of chat between the client and
> server that works as expected. These messages are test setup and are not part
> of the iperf mission payload data.
> Then the payload data starts. After the server has accepted 8kbytes of iperf
> payload (in 16 512-byte network packets!!!) the server closes the connection
> to the TCP connector with a FIN. A few microseconds later the TCP connector
> sends another 512-byte packet to which the the iperf server responds with a
> RST.
> Shortly thereafter the connections close with a bunch of TCP FIN packets.
> The router did not crash.
> h3. Running with asan and valgrind memcheck
> Running with either of these tools was inconslusive and did not reveal any
> stray memory writes or double frees that could corrupt the malloc heap.
> h2. Next steps
> Having the network peer of the TCP connector close the connection mid-stream
> is a pattern that is not tested in the self tests. A test to generate this
> pattern is in progress.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]