On Fri, 2017-05-26 at 10:48 +0200, Jiri Danek wrote:
> On Fri, May 26, 2017 at 12:18 AM, Alan Conway (JIRA) <[email protected]
> > wrote:
> >     [ https://issues.apache.org/jira/browse/DISPATCH-777?page=com.a
> > tlassian.jira.plugin.system.issuetabpanels:comment-
> > tabpanel&focusedCommentId=16025475#comment-16025475 ]
> > 
> > Alan Conway commented on DISPATCH-777:
> > --------------------------------------
> > 
> > This appears to be a race condition (double free) in the epoll
> > proactor, fixing...
> > 
> 
> Could you maybe describe in more detail how you went about triaging
> it? So that I know what more steps I can take next time I am
> reporting a crash like this. Thank you.
> -- 
> Jiří Daněk
> Messaging QA

I ran the test in a loop with 'rr' http://rr-project.org/ until it
crashed. 'rr' is a truly amazing extension to gdb - it records a
complete execution trace of the program (without imposing much run-time 
overhead) that you can replay forwards *and backwards* in gdb,
examining memory etc. as you normally would.

So playing the program up to the segfault in rr showed me that it
crashed on a pointer with the value 0x4242424242. Now I have this in my
.bashrc:

export MALLOC_PERTURB_=66 # 0x42

So freed memory is always filled with the hex pattern 424242. Now I
know the pointer is in memory that was previously freed so I do:

 watch -l ptr  # Standard gdb watchpoint on the pointer
 reverse-cont  # rr magic - continue *backwards* to the watchpoint

This runs the program back to the exact point where it was freed!

The rest is knowledge of the code: the crash comes just after the
pointer was extracted from epoll_wait(), the free is during
finalization of a closed connection - so I'm fairly sure there's a race
where we sometimes free memory used by a connection while it is still
registered with epoll.




---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to