Re: [jira] [Commented] (DISPATCH-777) [system_tests_drain] pn_object_free: corrupted double-linked list

Jiri Danek Fri, 26 May 2017 08:27:36 -0700

On Fri, May 26, 2017 at 5:15 PM, Alan Conway <[email protected]> wrote:


> On Fri, 2017-05-26 at 10:48 +0200, Jiri Danek wrote:
> > On Fri, May 26, 2017 at 12:18 AM, Alan Conway (JIRA) <[email protected]
> > > wrote:
> > >     [ https://issues.apache.org/jira/browse/DISPATCH-777?page=com.a
> > > tlassian.jira.plugin.system.issuetabpanels:comment-
> > > tabpanel&focusedCommentId=16025475#comment-16025475 ]
> > >
> > > Alan Conway commented on DISPATCH-777:
> > > --------------------------------------
> > >
> > > This appears to be a race condition (double free) in the epoll
> > > proactor, fixing...
> > >
> >
> > Could you maybe describe in more detail how you went about triaging
> > it? So that I know what more steps I can take next time I am
> > reporting a crash like this. Thank you.
> > --
> > Jiří Daněk
> > Messaging QA
>
> I ran the test in a loop with 'rr' http://rr-project.org/ until it
>

I stumbled upon this morning it when writing the question! I googled
https://www.google.cz/search?q=gdb+time+traveling+record+execution and this
was second result from top.


> crashed. 'rr' is a truly amazing extension to gdb - it records a
> complete execution trace of the program (without imposing much run-time
> overhead) that you can replay forwards *and backwards* in gdb,
> examining memory etc. as you normally would.
>
> So playing the program up to the segfault in rr showed me that it
> crashed on a pointer with the value 0x4242424242. Now I have this in my
> .bashrc:
>
> export MALLOC_PERTURB_=66 # 0x42
>
> So freed memory is always filled with the hex pattern 424242. Now I
> know the pointer is in memory that was previously freed so I do:
>
>  watch -l ptr  # Standard gdb watchpoint on the pointer
>  reverse-cont  # rr magic - continue *backwards* to the watchpoint
>
> This runs the program back to the exact point where it was freed!
>
> The rest is knowledge of the code: the crash comes just after the
> pointer was extracted from epoll_wait(), the free is during
> finalization of a closed connection - so I'm fairly sure there's a race
> where we sometimes free memory used by a connection while it is still
> registered with epoll.
>

So next time when reporting crash like this, I will set export
MALLOC_PERTURB_=66 # 0x42 and attach rr trace to Jira. Assuming it can be
moved between computers and that it compresses reasonably well. i haven't
actually tried using that yet.
-- 
Jiří Daněk
Messaging QA

Re: [jira] [Commented] (DISPATCH-777) [system_tests_drain] pn_object_free: corrupted double-linked list

Reply via email to