Re: Some questions about GHC source code and wiki

Simon Marlow Tue, 25 May 2010 02:32:10 -0700

On 24/05/2010 23:02, Marco Túlio Gontijo e Silva wrote:

Hi.


I was accepted as a student to the Google Summer of Code to work on
Implemmenting the Immix Garbage Collector in GHC.  I'm currently reading the
Commentary and the source code in order to understand what needs to be
changed.  I have some questions which are not central to my work, but I thought
I should ask them along with the ones that are, since they may be a source of
doubt that would make it harder for me to work in the code.

-------------------------------------------------------------------------------

From http://hackage.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/BlockAlloc

:
(...)

Currently, megablocks are never freed back to the OS, except at the end of
the program. This is a potential improvement that could be made.


What's the problem with freeing megablocks to the OS?  Why isn't it done
already?

No big problems, except that you have to be slightly careful to avoidthrashing - freeing too much and then having to re-allocate it againalmost immediately, which is easy to get wrong with a copying GC becausememory use spikes during a GC.


Ian is looking into this, I believe:

  http://hackage.haskell.org/trac/ghc/ticket/698

(...)

From Faster laziness using dynamic pointer tagging:

(...)

Return adresses have info tables just as heap closures do; this makes stack
frames have almost exactly the same layout as heap closures.  The info table
for a return address describes the layout of the stack fram for the garbage
collector, just as the info table for a heap closure describes the layout of
the closure.


Is the GC responsible for collecting the Stack objects, or it only scavenges
them to find pointers to Heap objects?

A stack is part of a TSO object, and TSOs are managed by the GC justlike other objects. Stacks and hence TSOs are mutable, of course.

(...)

From includes/rts/storage/GC.h:

(...)

typedef struct generation_ {
     unsigned int   no;                 // generation number


Why does the generation hold its own number?  Can't it be obtained by "this" -
generations?


It's just a convenience.

(...)

     unsigned int   n_words;             // number of words


Is this the total number of words in the all the blocks?  Can't it also be
obtained by n_blocks * (1<<  BLOCK_SHIFT)?


No, because most blocks are not completely full.

(...)

extern generation * g0;
extern generation * oldest_gen;


Can't generations[0] and generations[RtsFlags.GcFlags.generations-1] be used?


Again, just convenience.

(...)

From rts/sm/GC.c :

(...)

        nat gens = RtsFlags.GcFlags.generations;


I've seen some times this kind of construction in GHC, where a variable is
create for an assignment like this, and never changed again.  Are these kind of
constructions used only for efficiency?

Probably just to save typing! If gens is never modified, thedeclaration should really have a 'const' modifier on it.

(...)

From rts/sm/BlockAlloc.c:

(...)

     - most allocations are for small blocks


The block size is constant, so this should be for a small number of blocks,
instead of for small blocks?


Yes.

(...)
-------------------------------------------------------------------------------

Ok, these were the questions which are not central to my work.  The central
question I have now follows.

Please correct me if I'm wrong, but it seems to me that the memory allocator
goes as follows:

         _________
        ( Program )
         ---------
             ^
             |
   +------------------+
   | Object allocator |
   +------------------+
             ^
             |
    +-----------------+
    | Block allocator |
    +-----------------+
             ^
             |
+----------------------+
| Mega block allocator |
+----------------------+
             ^
             |
           ____
          ( OS )
           ----

Yes; I presume by the "Object allocator" you mean all the parts of thestorage manager that sit on top of the block allocator, including thegarbage collector.

I could find documentation about the Mega block allocator and the Block
allocator, but not of the Object allocator.  Actually, I made it up myself,
since I saw nothing written about it, but I wonder that there should be some
code that allocates objects in a block and request a new block when one is
needed.

The mutator allocates objects inline, by bumping the heap pointer (Hp).Allocation of objects is such a common operation it needs to beincredibly cheap.

I think that the first thing I should do is create a Line allocator, between
the Block allocator and the Object allocator.  But for this, I need to know
where the usual object allocation is done.

I don't think we need a separate Line allocator. Let's talk about thehigh level design, here's what I had in mind:

The mutator stays exactly as it is. We're going to use Immix for theold generation only, in the same way that mark-compact and mark-sweepare possible collection strategies for the old generation right now.

To make things really easy, we could even leave the marking phaseexactly as it is, using a mark bitmap. The only things we need tochange are:


 1. the way that we free memory from the old generation after
    a major GC, and

 2. the way that we allocate memory for copying objects during
    minor and major GC

We want to free memory in units of a Line, and allocate memory from thepool of available Lines (or from the block allocator if no free Linesare available).

To obtain free Lines, we can walk over the bitmap after a major GC andidentify any free Lines. Remember that we only mark the start of anobject, so an object may extend into the next Line (let's limitourselves to objects no larger than a Line for now), so a Line is onlyempty if the Line preceding it is also empty.

Then we need a representation for the free list of Lines, probably justlink them together in a list within a block. The blocks in the oldgeneration are already on a list attached to the generation, so we justneed a pointer to the first free Line from the generation.

That covers (1). For (2), you'll need to change the way that to-spacememory is allocated, but only for the old generation. If there is afree Line then use that. We'll also need to be able to allocate spacefor objects larger than a Line (or maybe larger than a certain fractionof a Line, to avoid too much fragmentation). Identifying objects thatare larger than a Line can be done manually, but it will probably bebetter in the long run if these objects have a special flag in the infotable.

Ok, that's a sketch of what I had in mind, there are quite a few gapsstill to be filled in, and I'm sure you'll encounter some things Ihaven't considered.

Another thing: is this the right list to this kind of question?  Or should I
mail glasgow-haskell-users?


Either list is fine...

Cheers,
        Simon

_______________________________________________
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc

Re: Some questions about GHC source code and wiki

Reply via email to