On Fri, 2017-05-26 at 10:48 +0200, Jiri Danek wrote: > On Fri, May 26, 2017 at 12:18 AM, Alan Conway (JIRA) <[email protected] > > wrote: > > [ https://issues.apache.org/jira/browse/DISPATCH-777?page=com.a > > tlassian.jira.plugin.system.issuetabpanels:comment- > > tabpanel&focusedCommentId=16025475#comment-16025475 ] > > > > Alan Conway commented on DISPATCH-777: > > -------------------------------------- > > > > This appears to be a race condition (double free) in the epoll > > proactor, fixing... > > > > Could you maybe describe in more detail how you went about triaging > it? So that I know what more steps I can take next time I am > reporting a crash like this. Thank you. > -- > Jiří Daněk > Messaging QA
I ran the test in a loop with 'rr' http://rr-project.org/ until it crashed. 'rr' is a truly amazing extension to gdb - it records a complete execution trace of the program (without imposing much run-time overhead) that you can replay forwards *and backwards* in gdb, examining memory etc. as you normally would. So playing the program up to the segfault in rr showed me that it crashed on a pointer with the value 0x4242424242. Now I have this in my .bashrc: export MALLOC_PERTURB_=66 # 0x42 So freed memory is always filled with the hex pattern 424242. Now I know the pointer is in memory that was previously freed so I do: watch -l ptr # Standard gdb watchpoint on the pointer reverse-cont # rr magic - continue *backwards* to the watchpoint This runs the program back to the exact point where it was freed! The rest is knowledge of the code: the crash comes just after the pointer was extracted from epoll_wait(), the free is during finalization of a closed connection - so I'm fairly sure there's a race where we sometimes free memory used by a connection while it is still registered with epoll. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
