On Dec 16, 2009, at 5:50 AM, Serguey Zefirov wrote:

> 2009/12/16 Matt Morrow <moonpa...@gmail.com>:
>> What are peoples' thoughts on this?
>> http://hackage.haskell.org/trac/ghc/ticket/650#comment:16
> 
> I think it won't get any better.
> 
> Either we have O(log(N)) updates because we have to update
> hierarchical structure to speed up GC scanning (to get it to
> O(Mlog(N)), where M is a number of updated cells), or we have O(N)
> scanning.
> 
> As far as I can tell, other systems (Java, for example) suffer from
> that problem as well.

The ticket suggests using VM protection to track writes.  This has been tried a 
number of times over the years by the GC community, and has usually gone badly 
- it turns out getting the OS to tell you about the page fault in reasonable 
time is hard.  It usually turns out to be cheaper just to make a cheap write 
barrier.

There are tricks that let us avoid re-scanning an array frequently during 
generational GC (and in particular avoid the problem of re-scanning the entire 
array during minor GC just to catch a single write).  But they require that we 
design appropriate read or write barriers for array accesses.  This is a Simple 
Matter of Programming, but hasn't risen to the top of anyone's priority list 
(in spite of the fact that this bug has existed for over a decade now, as far 
as I know).  It's fiddly coding and annoying to debug.

For those who are curious, here's one trick that avoids repeated re-scanning of 
arrays during GC:
  * During post-allocation tracing, move all data pointed to by the array into 
contiguous chunks of memory.
  * Group together that memory logically with the memory containing the array 
itself.
  * Keep track of whether anything points in to this memory; if nothing does, 
free the lot of it (or use the bad old tracing method; you won't find the big 
array and won't pay anything).
  * If there are subsequent writes, trace those and move them to the region.
  * Re-scan the whole region only if there have been enough writes (presumably 
causing the region to fill with data that has been overwritten and thrown away).

Here the cost is roughly proportional to the number of writes you perform.  
Haskell being lazy, that might be more than you would expect.  There are (lots 
of) other tricks along the same lines with differing tradeoffs, and naturally 
I've skimmed over some important details.  What you should take away is that 
GCing an array of pointers need not be expensive---we can do O(writes) scanning 
over its lifespan, rather than O(N) scanning work at every single GC.  And note 
that in general we don't pay any GC costs at all unless we actually keep the 
array around for a while.

-Jan-Willem Maessen

> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to