On Tue, Mar 15, 2016 at 2:18 PM, 'Bill Hart' via julia-users
<[email protected]> wrote:
>
> Thanks for your answers. Please see my followup questions below.
>
> On 15 March 2016 at 18:45, Yichao Yu <[email protected]> wrote:
>>
>>
>> On Mar 15, 2016 11:56 AM, "'Bill Hart' via julia-users"
>> <[email protected]> wrote:
>> >
>> > We have been trying to understand the garbage collector behaviour, since
>> > we had some code for which our machine is running out of memory in a matter
>> > of an hour.
>> >
>> > We already realised that Julia isn't responsible for memory we allocate
>> > on the C side unless we use jl_gc_counted_malloc, which we now do
>> > everywhere. But it still uses masses of memory where we were roughly
>> > expecting no growth in memory usage (lots of short-lived objects and 
>> > nothing
>> > much else).
>> >
>> > The behaviour of the gc on my machine seems to be to allocate objects
>> > until 23mb of memory is allocated, then do a jl_gc_collect. However, even
>> > after reading as much of the GC code in C as I can, I still can't determine
>> > why we are observing the behaviour we are seeing.
>> >
>> > Here is a concrete example whose behaviour I don't understand:
>> >
>> >      function doit2(n::Int)
>> >          s =
>> > BigInt(2234567876543456789876545678987654567898765456789876545678)
>> >          for i = 1:n
>> >             s += i
>> >          end
>> >          return s
>> >       end
>> >
>> >      doit(10000000000)
>> >
>> >
>> > This is using Julia's BigInt type which is using a GMP bignum. Julia
>> > replaces the GMP memory manager functions with jl_gc_counted_malloc, so
>> > indeed Julia knows about all the allocations made here.
>> >
>> > But what I don't understand is that the memory usage of Julia starts at
>> > about 124mb and rises up to around 1.5gb. The growth is initially fast and
>> > it gets slower and slower.
>>
>> I can't really reproduce this behavior.
>>
>> I assume doit is doit2 and not another function you defined somewhere
>> else.
>
>
> Yes, I meant doit2 of course.
>
> I'm really amazed you can't reproduce this. I can reproduce it over a wide
> variety of Julia versions on a variety of machines!
>
> I'm monitoring memory usage with top, not with Julia itself.

I'm using htop.

>>
>>
>> >
>> > Can someone explain why there is this behaviour? Shouldn't jl_gc_collect
>> > be able to collect every one of those allocations every time it reaches the
>> > collect_interval of 23mb (which remains constant on my machine with this
>> > example)?
>> >
>> > As an alternative experiment, I implemented a kind of bignum type using
>> > Julia arrays of UInts which I then pass to GMP low level mpn functions
>> > (which don't do any allocations on the C side). I only implemented the +
>> > operator, just enough to make this example work.
>> >
>> > The behaviour in this case is that memory usage is constant at around
>> > 124mb. There is no growth in memory usage over time.
>> >
>> > Why is the one example using so much memory and the other is not?
>> >
>> > Note that the bignums do not grow here. They are always essentially 3 or
>> > 4 limbs or something like that, in both examples.
>> >
>> > Some other quick questions someone might be able to answer:
>> >
>> > * Is there any difference in GC behaviour between using & vs Ref in
>> > ccall?
>>
>> Ref is heap allocated. & is not.
>
> Do you mean the Ref object itself is heap allocated but the & object is
> stack allocated? Or are you referring to a GC "heap" vs a pool or something?

heap as in allocated through the GC.
& does not create a new object and is a special syntax for ccall

>
> Why was & deprecated? It sounds like & should be faster, no?

Yes, it is, and that's why we don't have a depward for it. The
performance of Ref can be brought on pair with & with some compilar
optimizations (search for stack allocation on the issue tracker) and I
don't think we'll fully deprecate & before that.

>>
>> >
>> > * Does the Julia GC have a copying allocator for the short lived
>> > generation(s)?
>>
>> It has two generation and not copy
>
> So do the pools constitute the new generation? Or is the new generation
> separate from the pools?

They are all in the same pool

>
> I noticed in the code that it looks like things get promoted if they survive
> more than 2 generations (currently). But what happens to the objects between

more than 2 collection*

> generations 1 and 2 if they are not copied?

See the lifetime diagram in gc.c, they are marked differently.

>
> Or do you mean it has 2 generations plus a long term generation?

See also 
https://github.com/yuyichao/explore/blob/master/julia/new_gc/bit_swap.md
if you are interested.

>>
>> >
>> > * Does the Julia GC do a full mark and sweep every collection?
>>
>> No
>>
>> > Even of the long lived generation(s)? If not, which part of the GC code
>> > is responsible for deciding when to do a more involved sweep vs a faster
>> > sweep. I am having some trouble orienting myself with the code, and I'd
>> > really like to understand it a bit better.
>>
>> The counters in _jl_gc_collect
>
> You mean quick_count?
>
> In my version of Julia in gc.c, quick_count is set to 0 and incremented, but
> never used anywhere. How does this affect the behaviour? Is it used
> somewhere else in Julia?
>
> Or do you mean inc_count? I see this determines scanned_bytes_goal, which
> again is not used anywhere that I can see.
>
> I can't see anywhere else that inc_count is used either.

There are a lot of heuristics. see the logic that makes `sweep_mask =
GC_MARKED`. I believe I've recently removed those unused counters on
master.

>>
>>
>> >
>> > * Can someone confirm whether the "pools" mentioned in the GC code refer
>> > to pools for different sized allocations. Are there multiple pools for the
>> > same sized allocation, or did I misunderstand that?
>>
>> Pools are for small objects and they are segregated by size.
>
> Thanks.
>
>>
>> >
>> > Thanks in advance.
>> >
>> > Bill.
>> >
>> >
>> >
>
>

Reply via email to