Re: Some questions about GHC source code and wiki

Thomas Schilling Wed, 26 May 2010 19:28:25 -0700

Just a few bits, Simon will be able to help you with the rest.

On 27 May 2010 00:08, Marco Túlio Gontijo e Silva <mar...@debian.org> wrote:
> Hi Simon.
>
> Excerpts from Simon Marlow's message of Ter Mai 25 06:31:50 -0300 2010:
>> On 24/05/2010 23:02, Marco Túlio Gontijo e Silva wrote:
> (...)
>> >> From includes/rts/storage/GC.h:
> (...)
>> >>      unsigned int   n_words;             // number of words
>> >
>> > Is this the total number of words in the all the blocks?  Can't it also be
>> > obtained by n_blocks * (1<<  BLOCK_SHIFT)?
>>
>> No, because most blocks are not completely full.
> (...)
>> >> From rts/sm/GC.c :
>> > (...)
>> >>     nat gens = RtsFlags.GcFlags.generations;
>> >
>> > I've seen some times this kind of construction in GHC, where a variable is
>> > create for an assignment like this, and never changed again.  Are these 
>> > kind of
>> > constructions used only for efficiency?
>>
>> Probably just to save typing!  If gens is never modified, the
>> declaration should really have a 'const' modifier on it.
>>
>> > (...)
>> >> From rts/sm/BlockAlloc.c:
>> > (...)
>> >>      - most allocations are for small blocks
>> >
>> > The block size is constant, so this should be for a small number of blocks,
>> > instead of for small blocks?
>>
>> Yes.
>
> I've made some patches to GHC to correct these issues.  I don't know if you're
> willing to recieve these kind of small patches, but I'll send them anyway.
>
>> > (...)
>> > -------------------------------------------------------------------------------
>> >
>> > Ok, these were the questions which are not central to my work.  The central
>> > question I have now follows.
>> >
>> > Please correct me if I'm wrong, but it seems to me that the memory 
>> > allocator
>> > goes as follows:
>> >
>> >          _________
>> >         ( Program )
>> >          ---------
>> >              ^
>> >              |
>> >    +------------------+
>> >    | Object allocator |
>> >    +------------------+
>> >              ^
>> >              |
>> >     +-----------------+
>> >     | Block allocator |
>> >     +-----------------+
>> >              ^
>> >              |
>> > +----------------------+
>> > | Mega block allocator |
>> > +----------------------+
>> >              ^
>> >              |
>> >            ____
>> >           ( OS )
>> >            ----
>>
>> Yes; I presume by the "Object allocator" you mean all the parts of the
>> storage manager that sit on top of the block allocator, including the
>> garbage collector.
>
> I thought there would be a central point (function) for general object
> allocation.  The garbage collector just allocates new objects when it's
> copying from one generation to the another one, right?


The central functions are evacuate in sm/Evac.c and scavange in
sm/Scav.c.  I suggest you start looking at how these work.  The basic
idea is:  evacuate copies an object from the nursery into the mature
generation and then calls scavenge on all the pointers in the copied
object.

>
>> > I could find documentation about the Mega block allocator and the Block
>> > allocator, but not of the Object allocator.  Actually, I made it up myself,
>> > since I saw nothing written about it, but I wonder that there should be 
>> > some
>> > code that allocates objects in a block and request a new block when one is
>> > needed.
>>
>> The mutator allocates objects inline, by bumping the heap pointer (Hp).
>>   Allocation of objects is such a common operation it needs to be
>> incredibly cheap.
>
> But what happens when the heap pointer hits the end of a Block?  In this case
> it should not just bump the pointer.

Every basic block that performs some allocation is preceded by a heap
check.  If this test fails, a special RTS routine is invoked.  I'm not
sure which one this is.  Simon can tell you.

>
> (...)
>> To obtain free Lines, we can walk over the bitmap after a major GC and
>> identify any free Lines.  Remember that we only mark the start of an
>> object, so an object may extend into the next Line (let's limit
>> ourselves to objects no larger than a Line for now), so a Line is only
>> empty if the Line preceding it is also empty.
>
> Ok.  There's also the corner case where the object starts in the beginning of
> the Line, in which we can consider the next Line to be empty, but probably 
> it's
> not an issue to ignore this one.
>
> We'll need to mark differently Lines that are not empty because they have an
> objects and Lines that are not empty because the previous Line had an object.
> I'm saying this because I got confused in the first time I read "a Line is 
> only
> empty if the Line preceding it is also empty", thinking of an infinite
> regression.
>
> Anyway, this is proposed in the Immix paper.  In their terms, they mark only
> the one Line for small objects and they don't mark the last Line of medium
> objects, and ignore the first Line of each role in allocation.

The idea is to use Immix only for the mature generation.  Because
evacuate already needs to look at the info table (except in a few
specially optimised cases) so you'll know the size of the object as
well.  It should be possible to combine line marking with this.

>
>> Then we need a representation for the free list of Lines, probably just
>> link them together in a list within a block.  The blocks in the old
>> generation are already on a list attached to the generation, so we just
>> need a pointer to the first free Line from the generation.
>
> So you're proposing that the last free Line of a Block points to the first 
> free
> Line of the next one?
>
>> That covers (1).  For (2), you'll need to change the way that to-space
>> memory is allocated, but only for the old generation.  If there is a
>> free Line then use that.  We'll also need to be able to allocate space
>> for objects larger than a Line (or maybe larger than a certain fraction
>> of a Line, to avoid too much fragmentation).
>
> I didn't got the part in the parenthesis.
>
>> Identifying objects that are larger than a Line can be done manually, but it
>> will probably be better in the long run if these objects have a special flag
>> in the info table.
>
> Yes.  Immix authors propose that the differentiation between objects smaller
> than a Line (Small) and larger than a Line but much smaller than a Block
> (Medium) is done in allocation time, to save time of the GC.  They propose 
> that
> the Medium objects are objects smaller than a quarter of the Block size.  If
> the object is bigger than this, it'll be allocated differently.  How is this
> currently done in GHC?
>
>> Ok, that's a sketch of what I had in mind, there are quite a few gaps
>> still to be filled in, and I'm sure you'll encounter some things I
>> haven't considered.
>
> Thanks for that, things are much more clear for me now.  There're so many
> new things too look at that I'm feeling a bit lost in the RTS code.  More 
> focus
> is welcome.  So, initially I must look at the code of the major garbage
> collection, and think of a representation of the list of free blocks of Lines.
>
> Maybe the start of the block can hold the size of the block (in number of
> Lines) and a pointer to the next block?
>
> I'll study more how the sweep works currently.  Is there something on this
> subject, which could complement the source code?  Maybe an article on how this
> is done...
>
> Greetings.
> (...)
> --
> marcot
> http://wiki.debian.org/MarcoSilva
>



-- 
Push the envelope.  Watch it bend.

_______________________________________________
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc

Re: Some questions about GHC source code and wiki

Reply via email to