I'm able to reproduce it in rr and found the issue.

TL;DR the issue is at
https://github.com/JuliaGraphics/Cairo.jl/blame/master/src/Cairo.jl#L625,
where it passes the ownership of a cairo pointer to julia, causing a double
free.

Here's the rough process of my debugging, I'm not really sure how to
summarize it though....

1. It abort in cairo `cairo_destory_path` so I first compiled a cairo with
debug symbol to make my life easier. (the function is pretty short so
reading the disasm would have worked too)
2. It is free'ing `path->data` so I added a watchpoint on it `watch -l
path->data` and reverse-continue to find the point of assignment.
3. Assignment happens in cairo from a valid malloc so path->data isn't
corrupted.
4. Now it takes some guessing to figure out exactly what's wrong. I'm not
sure how glibc stores it's malloc metadata (would help to know that) so I
tried the naive thing and watch the intptr_t before the malloc result
(that's how julia store the gc metadata) and run forward. None of the
assignment to this location looks suspicious (they are all in glibc and the
first hit isn't free'ing this value)
5. So now I tried the brute force way,the pointer (`path->data`) I see is
`0x3746950` so I simply did a conditional breakpoint to see when it's
free'd with `br free if $rdi == 0x3746950`. I use rdi to get the first
argument since the glibc I installed doesn't have that detailed debug info.
6. After a long run (conditional breakpoint is really slow which is why I
didn't use it first) it hits a breakpoint in the julia GC when free'ing an
array. The array has a data pointer the same as the one in question and
that's before the pointer is free'd by cairo so sth is wrong with the
creation of the array. Now simply watch the `a->data` and go back again.
I'm lucky this time, if this didn't work, the next thing to try would be
trying to reduce the code/ run GC more often so that I can afford looking
at the code more carefully instead of just catching events in the debugger.
7. As expected, it hits `jl_ptr_to_array` and going up a frame it seems
that the caller is supplying a cairo pointer and transfering the ownership,
which is wrong.


On Tue, Sep 13, 2016 at 3:36 PM, Yichao Yu <yyc1...@gmail.com> wrote:

>
>
> On Tue, Sep 13, 2016 at 3:31 PM, Andreas Lobinger <lobing...@gmail.com>
> wrote:
>
>> Hello colleague,
>>
>> On Tuesday, September 13, 2016 at 7:25:38 PM UTC+2, Yichao Yu wrote:
>>>
>>>
>>> On Tue, Sep 13, 2016 at 12:49 PM, Andreas Lobinger <lobi...@gmail.com>
>>> wrote:
>>>
>>>> Hello colleagues,
>>>>
>>>> i'm trying to find out, why this
>>>> ...
>>>>
>>> fails miserably. I guess, but cannot track it down right now: There is
>>>> something wrong in memory management of Cairo.jl that only shows up for
>>>> objects that could have been freed long ago and julia and libcairo have
>>>> different concepts of invalidation.
>>>>
>>>> Any blog/receipe/issue that deals with GC debugging?
>>>>
>>>
>>> It's not too different from debugging memory issue in any other program.
>>> It usually helps (a lot) to reproduce under rr[1]
>>>
>>
>> Many thanks for pointing to this. I was aware it exists but wasn't aware
>> of their progress.
>>
>>
>>> Other than that, it strongly depend on the kind of error and I've seen
>>> it happens due to almost all parts of the runtime and it's really hard to
>>> summarize.
>>>
>>
>> What do you mean with "happens due to almost all parts of the runtime" ?
>>
>
> The general procedure is basically catch the failure and try to figure out
> why it got into this states. This means that you generally need to trace
> back where certain value is generated which also usually means that you
> need to trace back through a few layers of code and they might be scattered
> all over the place.
>
>

Reply via email to