Re: Some questions about GHC source code and wiki

Simon Marlow Thu, 27 May 2010 02:39:27 -0700

On 27/05/2010 00:08, Marco Túlio Gontijo e Silva wrote:

I've made some patches to GHC to correct these issues.  I don't know if you're
willing to recieve these kind of small patches, but I'll send them anyway.


I didn't see any patches - did you forget to attach them?

I thought there would be a central point (function) for general object
allocation.  The garbage collector just allocates new objects when it's
copying from one generation to the another one, right?


In the garbage collector, space for copying objects is allocated using

StgPtr alloc_for_copy (nat size, generation *gen);

in rts/sm/Evac.c.

The mutator allocates objects inline, by bumping the heap pointer (Hp).
   Allocation of objects is such a common operation it needs to be
incredibly cheap.


But what happens when the heap pointer hits the end of a Block?  In this case
it should not just bump the pointer.

Right - before allocating, the mutator does a "heap check", comparing Hpwith HpLim. If the heap check fails, then the mutator has to arrange tosave its current state and return to the runtime system requesting a GC.There are quite a lot of details in how this happens, but the generalidea is that one of the functions in rts/HeapStackCheck.cmm gets called,which returns to the RTS.

The nursery is a linked list of blocks, so when a heap check fails wemight be able to just start allocating into the next block rather thanrequesting a GC: see the code in HeapStackCheck.cmm.

Ok.  There's also the corner case where the object starts in the beginning of
the Line, in which we can consider the next Line to be empty, but probably it's
not an issue to ignore this one.


That's a worthwhile optimisation, I think.

We might consider "medium" objects to be any object that is larger than,say 75% of a line. The point is we want to avoid having objects thatare nearly a line size being allocated in the same way as small objects,because one of these objects will almost always cause allocation in thecurrent line to fail, which might waste up to a line's worth of memoryin the current line. That's what I menat by fragmentation.

That fraction is something we could experiment with, maybe even 50% ismore sensible.

We'll need to mark differently Lines that are not empty because they have an
objects and Lines that are not empty because the previous Line had an object.
I'm saying this because I got confused in the first time I read "a Line is only
empty if the Line preceding it is also empty", thinking of an infinite
regression.

I'm proposing that we don't have explicit line marks at all, we just usethe object mark bitmap (we might have to expand this to a bytemap forparallel GC).

Anyway, this is proposed in the Immix paper.  In their terms, they mark only
the one Line for small objects and they don't mark the last Line of medium
objects, and ignore the first Line of each role in allocation.

I suggest we allocate medium objects in a separate space entirely, andmaybe just copy them as normal, rather than using mark/sweep on them.In GHC's GC we can choose on a block-by-block basis whether the block iscopied during GC or marked+swept; this is how we can implement"opportunistic defragmentation". Take a look at the current sweepingalgorithm in rts/sm/Sweep.c to see what I mean.

Then we need a representation for the free list of Lines, probably just
link them together in a list within a block.  The blocks in the old
generation are already on a list attached to the generation, so we just
need a pointer to the first free Line from the generation.


So you're proposing that the last free Line of a Block points to the first free
Line of the next one?

That's one option, yes. I'm not sure if its the easiest, maybe you canexplore the alternatives here.

That covers (1).  For (2), you'll need to change the way that to-space
memory is allocated, but only for the old generation.  If there is a
free Line then use that.  We'll also need to be able to allocate space
for objects larger than a Line (or maybe larger than a certain fraction
of a Line, to avoid too much fragmentation).


I didn't got the part in the parenthesis.


Hopefully I explained that above...

Identifying objects that are larger than a Line can be done manually, but it
will probably be better in the long run if these objects have a special flag
in the info table.


Yes.  Immix authors propose that the differentiation between objects smaller
than a Line (Small) and larger than a Line but much smaller than a Block
(Medium) is done in allocation time, to save time of the GC.  They propose that
the Medium objects are objects smaller than a quarter of the Block size.  If
the object is bigger than this, it'll be allocated differently.  How is this
currently done in GHC?


See Thomas's answer on this one.

Ok, that's a sketch of what I had in mind, there are quite a few gaps
still to be filled in, and I'm sure you'll encounter some things I
haven't considered.


Thanks for that, things are much more clear for me now.  There're so many
new things too look at that I'm feeling a bit lost in the RTS code.  More focus
is welcome.  So, initially I must look at the code of the major garbage
collection, and think of a representation of the list of free blocks of Lines.

If you're a bit lost, one way to get acquainted is to start adding printstatements to the code so you can get a better idea of what's going onat each point. e.g., you might want to try printing out each object asit is evacuated. I find this kind of hands-on approach much moreilluminating than staring at the code.

Maybe the start of the block can hold the size of the block (in number of
Lines) and a pointer to the next block?

This is an alternative to the linked list of free lines I describedabove, yes.

I'll study more how the sweep works currently.  Is there something on this
subject, which could complement the source code?  Maybe an article on how this
is done...

The Jones/Lins book is the best way to learn about GC basics, or if youcan't get hold of that easily try Paul Wilson's survey paper:


https://ritdml.rit.edu/bitstream/handle/1850/5112/PWilsonProceedings1992.pdf?sequence=1

Cheers,
        Simon

_______________________________________________
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc

Re: Some questions about GHC source code and wiki

Reply via email to