On Tue, Mar 15, 2016 at 1:45 PM, Yichao Yu <[email protected]> wrote: > > On Mar 15, 2016 11:56 AM, "'Bill Hart' via julia-users" > <[email protected]> wrote: >> >> We have been trying to understand the garbage collector behaviour, since >> we had some code for which our machine is running out of memory in a matter >> of an hour. >> >> We already realised that Julia isn't responsible for memory we allocate on >> the C side unless we use jl_gc_counted_malloc, which we now do everywhere. >> But it still uses masses of memory where we were roughly expecting no growth >> in memory usage (lots of short-lived objects and nothing much else). >> >> The behaviour of the gc on my machine seems to be to allocate objects >> until 23mb of memory is allocated, then do a jl_gc_collect. However, even >> after reading as much of the GC code in C as I can, I still can't determine >> why we are observing the behaviour we are seeing. >> >> Here is a concrete example whose behaviour I don't understand: >> >> function doit2(n::Int) >> s = >> BigInt(2234567876543456789876545678987654567898765456789876545678) >> for i = 1:n >> s += i >> end >> return s >> end >> >> doit(10000000000)
Another note is that adding finalizers will (currently) extend the lifetime of an object. https://github.com/JuliaLang/julia/pull/13995 should solve this problem but I'm holding on to it before we finish some other GC rework. >> >> >> This is using Julia's BigInt type which is using a GMP bignum. Julia >> replaces the GMP memory manager functions with jl_gc_counted_malloc, so >> indeed Julia knows about all the allocations made here. >> >> But what I don't understand is that the memory usage of Julia starts at >> about 124mb and rises up to around 1.5gb. The growth is initially fast and >> it gets slower and slower. > > I can't really reproduce this behavior. > > I assume doit is doit2 and not another function you defined somewhere else. > > >> >> Can someone explain why there is this behaviour? Shouldn't jl_gc_collect >> be able to collect every one of those allocations every time it reaches the >> collect_interval of 23mb (which remains constant on my machine with this >> example)? >> >> As an alternative experiment, I implemented a kind of bignum type using >> Julia arrays of UInts which I then pass to GMP low level mpn functions >> (which don't do any allocations on the C side). I only implemented the + >> operator, just enough to make this example work. >> >> The behaviour in this case is that memory usage is constant at around >> 124mb. There is no growth in memory usage over time. >> >> Why is the one example using so much memory and the other is not? >> >> Note that the bignums do not grow here. They are always essentially 3 or 4 >> limbs or something like that, in both examples. >> >> Some other quick questions someone might be able to answer: >> >> * Is there any difference in GC behaviour between using & vs Ref in ccall? > > Ref is heap allocated. & is not. > >> >> * Does the Julia GC have a copying allocator for the short lived >> generation(s)? > > It has two generation and not copy > >> >> * Does the Julia GC do a full mark and sweep every collection? > > No > >> Even of the long lived generation(s)? If not, which part of the GC code is >> responsible for deciding when to do a more involved sweep vs a faster sweep. >> I am having some trouble orienting myself with the code, and I'd really like >> to understand it a bit better. > > The counters in _jl_gc_collect > > >> >> * Can someone confirm whether the "pools" mentioned in the GC code refer >> to pools for different sized allocations. Are there multiple pools for the >> same sized allocation, or did I misunderstand that? > > Pools are for small objects and they are segregated by size. > >> >> Thanks in advance. >> >> Bill. >> >> >>
