I'm able to reproduce it in rr and found the issue. TL;DR the issue is at https://github.com/JuliaGraphics/Cairo.jl/blame/master/src/Cairo.jl#L625, where it passes the ownership of a cairo pointer to julia, causing a double free.
Here's the rough process of my debugging, I'm not really sure how to summarize it though.... 1. It abort in cairo `cairo_destory_path` so I first compiled a cairo with debug symbol to make my life easier. (the function is pretty short so reading the disasm would have worked too) 2. It is free'ing `path->data` so I added a watchpoint on it `watch -l path->data` and reverse-continue to find the point of assignment. 3. Assignment happens in cairo from a valid malloc so path->data isn't corrupted. 4. Now it takes some guessing to figure out exactly what's wrong. I'm not sure how glibc stores it's malloc metadata (would help to know that) so I tried the naive thing and watch the intptr_t before the malloc result (that's how julia store the gc metadata) and run forward. None of the assignment to this location looks suspicious (they are all in glibc and the first hit isn't free'ing this value) 5. So now I tried the brute force way,the pointer (`path->data`) I see is `0x3746950` so I simply did a conditional breakpoint to see when it's free'd with `br free if $rdi == 0x3746950`. I use rdi to get the first argument since the glibc I installed doesn't have that detailed debug info. 6. After a long run (conditional breakpoint is really slow which is why I didn't use it first) it hits a breakpoint in the julia GC when free'ing an array. The array has a data pointer the same as the one in question and that's before the pointer is free'd by cairo so sth is wrong with the creation of the array. Now simply watch the `a->data` and go back again. I'm lucky this time, if this didn't work, the next thing to try would be trying to reduce the code/ run GC more often so that I can afford looking at the code more carefully instead of just catching events in the debugger. 7. As expected, it hits `jl_ptr_to_array` and going up a frame it seems that the caller is supplying a cairo pointer and transfering the ownership, which is wrong. On Tue, Sep 13, 2016 at 3:36 PM, Yichao Yu <yyc1...@gmail.com> wrote: > > > On Tue, Sep 13, 2016 at 3:31 PM, Andreas Lobinger <lobing...@gmail.com> > wrote: > >> Hello colleague, >> >> On Tuesday, September 13, 2016 at 7:25:38 PM UTC+2, Yichao Yu wrote: >>> >>> >>> On Tue, Sep 13, 2016 at 12:49 PM, Andreas Lobinger <lobi...@gmail.com> >>> wrote: >>> >>>> Hello colleagues, >>>> >>>> i'm trying to find out, why this >>>> ... >>>> >>> fails miserably. I guess, but cannot track it down right now: There is >>>> something wrong in memory management of Cairo.jl that only shows up for >>>> objects that could have been freed long ago and julia and libcairo have >>>> different concepts of invalidation. >>>> >>>> Any blog/receipe/issue that deals with GC debugging? >>>> >>> >>> It's not too different from debugging memory issue in any other program. >>> It usually helps (a lot) to reproduce under rr[1] >>> >> >> Many thanks for pointing to this. I was aware it exists but wasn't aware >> of their progress. >> >> >>> Other than that, it strongly depend on the kind of error and I've seen >>> it happens due to almost all parts of the runtime and it's really hard to >>> summarize. >>> >> >> What do you mean with "happens due to almost all parts of the runtime" ? >> > > The general procedure is basically catch the failure and try to figure out > why it got into this states. This means that you generally need to trace > back where certain value is generated which also usually means that you > need to trace back through a few layers of code and they might be scattered > all over the place. > >