[ 
https://issues.apache.org/jira/browse/DISPATCH-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiri Daněk updated DISPATCH-2059:
---------------------------------
    Description: 
Dispatch has env variable {{QPID_DISPATCH_RUNNER}} which is (according to 
comment) intended to be used for running tests under valgrind. That is outdated 
comment, because the memory checking is currently solved in a different way, in 
{{RuntimeChecks.cmake}}. One tool that would make sense to use to wrap dispatch 
is rr, the record-replay debugger from Mozilla (https://rr-project.org/).

I've previously tried rr with (very) limited success in DISPATCH-782.

[~aconway] considered it while working on DISPATCH-902 and used it on other 
issues.

There has been an attempt 
https://issues.apache.org/jira/browse/DISPATCH-739?focusedCommentId=15983719&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15983719
 to use rr which however did not survive in the mainline to the present day.

I have two problems with rr:

# Dispatch system-tests send SIGTERM to the subprocess itself, which is rr. 
What is necessary is to kill its children instead. Killing rr causes abrupt 
termination of the recording. When I issue ^C to a {{rr record qdrouterd -c 
...}} in the terminal, that signal goes correctly to the child. I am not sure 
what's happening there in the test, where the difference comes from. Explicitly 
killing only children in the system test does the right thing. Sadly doing that 
requires hacks, python's subprocess does not allow to query children easily. 
The os module has some ways; psutil is the easiest, but thats a 3rd party 
dependency.
# CLion debugger disconnects during replay when qdrouterd gets SIGTERM, but the 
router handles that signal and continues running (cleanup)

One awesome feature of rr is that the recording can be replayed many times, 
backwards and forwards, and all memory addresses stay the same in the 
recording, on every replay. Meaning that one can use {{watch -l *0x0000000}} 
breakpoints to watch specific places of memory, and use {{reverse-cont}} gdb 
command. (rr emulates the gdb UI, it's a wrapper over gdb, actually, if I 
understand correctly.)

h3. Chaos mode

rr has a {{--chaos}} switch which tries to explore thread schedules as to 
reveal more crashes; that could be useful

  was:
Dispatch has env variable {{QPID_DISPATCH_RUNNER}} which is (according to 
comment) intended to be used for running tests under valgrind. That is outdated 
comment, because the memory checking is currently solved in a different way, in 
{{RuntimeChecks.cmake}}. One tool that would make sense to use to wrap dispatch 
is rr, the record-replay debugger from Mozilla (https://rr-project.org/).

I've previously tried rr with (very) limited success in DISPATCH-782.

[~aconway] considered it while working on DISPATCH-902 and used it on other 
issues.

There has been an attempt 
https://issues.apache.org/jira/browse/DISPATCH-739?focusedCommentId=15983719&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15983719
 to use rr which however did not survive in the mainline to the present day.

I have two problems with rr:

# Dispatch system-tests send SIGTERM to the subprocess itself, which is rr. 
What is necessary is to kill its children instead. Killing rr causes abrupt 
termination of the recording. When I issue ^C to a {{rr record qdrouterd -c 
...}} in the terminal, that signal goes correctly to the child. I am not sure 
what's happening there in the test, where the difference comes from. Explicitly 
killing only children in the system test does the right thing. Sadly doing that 
requires hacks, python's subprocess does not allow to query children easily. 
The os module has some ways; psutil is the easiest, but thats a 3rd party 
dependency.
# CLion debugger disconnects during replay when qdrouterd gets SIGTERM, but the 
router handles that signal and continues running (cleanup)

One awesome feature of rr is that the recording can be replayed many times, 
backwards and forwards, and all memory addresses stay the same in the 
recording, on every replay. Meaning that one can use {{watch -l *0x0000000}} 
breakpoints to watch specific places of memory, and use {{reverse-cont}} gdb 
command. (rr emulates the gdb UI, it's a wrapper over gdb, actually, if I 
understand correctly.)


> Support running router under rr during test execution
> -----------------------------------------------------
>
>                 Key: DISPATCH-2059
>                 URL: https://issues.apache.org/jira/browse/DISPATCH-2059
>             Project: Qpid Dispatch
>          Issue Type: Wish
>          Components: Tests
>    Affects Versions: 1.15.0
>            Reporter: Jiri Daněk
>            Priority: Major
>
> Dispatch has env variable {{QPID_DISPATCH_RUNNER}} which is (according to 
> comment) intended to be used for running tests under valgrind. That is 
> outdated comment, because the memory checking is currently solved in a 
> different way, in {{RuntimeChecks.cmake}}. One tool that would make sense to 
> use to wrap dispatch is rr, the record-replay debugger from Mozilla 
> (https://rr-project.org/).
> I've previously tried rr with (very) limited success in DISPATCH-782.
> [~aconway] considered it while working on DISPATCH-902 and used it on other 
> issues.
> There has been an attempt 
> https://issues.apache.org/jira/browse/DISPATCH-739?focusedCommentId=15983719&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15983719
>  to use rr which however did not survive in the mainline to the present day.
> I have two problems with rr:
> # Dispatch system-tests send SIGTERM to the subprocess itself, which is rr. 
> What is necessary is to kill its children instead. Killing rr causes abrupt 
> termination of the recording. When I issue ^C to a {{rr record qdrouterd -c 
> ...}} in the terminal, that signal goes correctly to the child. I am not sure 
> what's happening there in the test, where the difference comes from. 
> Explicitly killing only children in the system test does the right thing. 
> Sadly doing that requires hacks, python's subprocess does not allow to query 
> children easily. The os module has some ways; psutil is the easiest, but 
> thats a 3rd party dependency.
> # CLion debugger disconnects during replay when qdrouterd gets SIGTERM, but 
> the router handles that signal and continues running (cleanup)
> One awesome feature of rr is that the recording can be replayed many times, 
> backwards and forwards, and all memory addresses stay the same in the 
> recording, on every replay. Meaning that one can use {{watch -l *0x0000000}} 
> breakpoints to watch specific places of memory, and use {{reverse-cont}} gdb 
> command. (rr emulates the gdb UI, it's a wrapper over gdb, actually, if I 
> understand correctly.)
> h3. Chaos mode
> rr has a {{--chaos}} switch which tries to explore thread schedules as to 
> reveal more crashes; that could be useful



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to