[
https://issues.apache.org/jira/browse/PROTON-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17445250#comment-17445250
]
Ken Giusti commented on PROTON-2466:
------------------------------------
This is a difficult issue to reproduce. In my experience it can take a few
hours and the resulting log files are huge.
To reproduce:
# check out head of the qdrouter 1.18.x branch
# back out the pointer clear patch that prevents the crash from occurring:
## commit 6734891419fcafdbc87d40eca269d07821c1b813 DISPATCH-2286: reset the
raw conn context when handling disconnect
# run two routers using the above configurations:
## rm -f qdrouterd-A-log.txt ; qdrouterd -c qdrouterd-A.conf & rm -f
qdrouterd-B-log.txt ; qdrouterd -c qdrouterd-B.conf &
# Install iperf3
# spawn an iperf3 server for the router to connected to:
## iperf3 -s -p 8080 &
# run iperf3 clients to generate traffic in a loop:
## while iperf3 -c 127.0.0.1 -p 8000 -t 5 -P 8; do echo "OK"; sleep 2; done
# wait for crash
> raw connection posts wake events after disconnect event is handled
> ------------------------------------------------------------------
>
> Key: PROTON-2466
> URL: https://issues.apache.org/jira/browse/PROTON-2466
> Project: Qpid Proton
> Issue Type: Bug
> Components: proton-c
> Affects Versions: proton-c-0.36.0
> Reporter: Ken Giusti
> Priority: Major
> Attachments: qdrouterd-A.conf, qdrouterd-B.conf
>
>
> While running tcp stress tests against qdrouterd a crash occurred. The crash
> was due to a stale pointer dereference.
> qdrouterd code has been patched to properly clear the pointer and check for
> null in the effected codepath. However...
> ... the access occurred while processing a PN_RAW_CONNECTION_WAKE event that
> arrived on a raw connection *after* a PN_RAW_CONNECTION_DISCONNECTED event
> previously arrived on the raw connection.
> IIUC the PN_RAW_CONNECTION_DISCONNECTED event is supposed to be the last
> event generated on a raw connection, and once that event has been handled the
> raw connection is released. If that is correct then the arrival of the
> following WAKE event is a bug.
> Here is the log output from the router just prior to the crash (filtered on
> the affected connection):
> $ tail C140.txt
>
> 2021-11-16 17:11:10.925728 -0500 TCP_ADAPTOR (debug) [C140]
> PN_RAW_CONNECTION_WAKE connector
>
> 2021-11-16 17:11:10.926990 -0500 TCP_ADAPTOR (debug) [C140]
> PN_RAW_CONNECTION_WAKE connector
>
> 2021-11-16 17:11:10.927001 -0500 TCP_ADAPTOR (debug) [C140]
> PN_RAW_CONNECTION_READ connector Event
>
> 2021-11-16 17:11:10.927034 -0500 TCP_ADAPTOR (debug) [C140]
> PN_RAW_CONNECTION_READ Read 0 bytes. Total read 0 bytes
>
> 2021-11-16 17:11:10.927596 -0500 TCP_ADAPTOR (debug) [C140]
> PN_RAW_CONNECTION_WRITTEN connector pn_raw_connection_take_written_buffers
> wrote 3276\
> 8 bytes. Total written 36929573 bytes
>
> 2021-11-16 17:11:10.928207 -0500 TCP_ADAPTOR (debug) [C140][L322]
> PN_RAW_CONNECTION_CLOSED_READ connector
>
> 2021-11-16 17:11:10.928591 -0500 TCP_ADAPTOR (debug) [C140]
> PN_RAW_CONNECTION_CLOSED_WRITE connector
>
> 2021-11-16 17:11:10.929160 -0500 TCP_ADAPTOR (debug) [C140]
> PN_RAW_CONNECTION_WRITTEN connector pn_raw_connection_take_written_buffers
> wrote 3276\
> 8 bytes. Total written 36962341 bytes
>
> *2021-11-16 17:11:10.929410 -0500 TCP_ADAPTOR (info) [C140]
> PN_RAW_CONNECTION_DISCONNECTED connector*
> *2021-11-16 17:11:10.929915 -0500 TCP_ADAPTOR (debug) [C140]
> PN_RAW_CONNECTION_WAKE connector*
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]