Am 22.02.2012 20:40, schrieb H. S. Teoh:
On Wed, Feb 22, 2012 at 07:56:15PM +0100, Benjamin Thaut wrote:
As I'm not satisfied with the current GC D has and don't see the
situation improving in the future without significant changes to the
compiler I wrote the following document that points out all the
possible issues with garbage collection I could think of and
possible solutions for them. This document is just a draft and only
a proposal, critics and comments are welcome.

Have you seen this?

        http://www.llucax.com.ar/proj/dgc/index.html

Yes I know about dgc it is better but still not on par with for example the GC that is shipped with the .NET 4.0 All I'm saying is that without propper support from the compiler we are not going to get GCs as good as in other modern languages.



[...]
2) Tracking references on the stack:

The D compiler always needs to emit a full stack frame so that the
GC can walk up the stack at any time in the program. The stack frame
of every function generated by the D compiler starts which a
bitfield (usually the size of a machine register) where each bit
indicates that these bytes are a pointer / reference. The bitfield
needs to be large enough to cover the whole stack frame of the
function.

This adds a lot of overhead to the runtime stack, esp. if you have deep
recursion. It's also not necessarily faster, since the GC now has to
parse a bitfield (a variable-length encoded bitfield, no less), instead
of just scanning words directly, which can be optimized by CPU-specific
microcode depending on the target platform.


If you have a better idea for percise stack scanning I'm open for suggestions.


[...]
Every scope generated by the D compiler would need additional code
at the start and end of the scope. When the scope is entered the
bitfield would be patched to represent the new variables inside the
scope and when the scope is left the bitfield is patched again to
remove the changes that were made on entering the scope.

This would introduce quite a lot of overhead per scope. It will also
lead to strange things like:

        if (x) y();     // faster
        if (x) { y(); } // slower

which will encourage people to omit {} after if, which makes code more
fragile and hard to read.


Scopeds that don't have variables declared inside them don't need the bitfield patching. so that argument is completely pointless. Scopes that contain varaibles that are not pointers or refrences also don't need the patching.


Every time a function gets called that did not get generated by the
D compiler ( C / C++ etc functions) the compiler generates a call
into the runtime and passes the current stack pointer and stack base
pointer to it.

void _d_externalCallStart(void* stackptr, void* baseptr);

Every time such a function returns the compiler   generates a call
into the the runtime too.

void _d_externalCallEnd(void* stackptr, void* baseptr);

Every time a functions that can get called from other languages
(extern(C) etc) are executed the end callback is inserted at the
start of the functions and the start callback is inserted at the end
of the function.
Using these callbacks the GC can mark certain parts of the stack as
"non-D" and ignore them when scanning for bit fields and
references/pointers. It can just skip parts of the stack that are
"non-D" and therefore does not need a full stack frame within these
"non-D" sections.

This may not be a bad idea, though it does introduce some overhead when
you cross language boundaries.


All these features are required so that the GC can precisely scan
pointers/references on the stack and change them as necessary.
Remaining issues: The D compiler can freely move around value types
on the stack. With such move operations it would be necessary to fix
up all the bit fields. I needs to be investigated if this is doable.

This can only make the GC slower, especially if it needs to update
variable-length encoded bitfields. Of course, you may be able to offset
this by making it possible to do real-time GC, (reduced throughput but
less waiting time for collection cycles) but that's a very complex
problem.


3) Tracking references on the heap

For every class / struct a mixin template which is defined inside
the runtime gets instantiated. This template can then use
introspection to generate the necessary information to allow the GC
to scan the pointers within that struct / class precisely.

So basically you're proposing a compacting precise-scanning GC instead
of the current conservative GC. There are pros and cons in either
approach; it'd be nice if you could compare them.


I'm proposing a compacting percise-scanning generantional gc that has thread local pools and can scan these thread local pools without stopping the other threads. Also it will be able to collect young generations without the need to scan the old generations.


[...]
5) pointer / reference changed callback

Every time a pointer / reference is changed the D compiler emits a
call into the runtime and passes the new value of the reference /
pointer with it.

This introduces a LOT of overhead, especially in a language like D which
manipulates a lot of pointers quite often (esp. if you use slices a
lot).


I did not make this up, I know a smalltalk implementation that actually does this and is pretty efficient.


[...]
-If the GC interrupts a thread right before any of the above mentioned
callbacks happen it will cause a invalid state for the GC and the GC
might access invalid pointers. It has to be investigated if this leads
to invalid behavior. It can be fixed by not interrupting a thread but
pause it the next time it calls any of callbacks, or other functions
that can be interrupted by the GC.

This adds a lot of intermittent pauses in program execution.

Why should there be pauses, there is just a additional check in every callback to the gc there already is. When the gc wants to collect he sets the pause flag to true and waits until all required threads paused themselfs.


The link I posted at the top has a GC implementation that doesn't
introduce *any* of this overhead (the GC runs concurrently with the
program), with no pause during a collection cycle (garbage is
incrementally collected when allocating new memory).


Any non percise scanning algorithms will not be able to deal with memory fragmentation and will also have uneccessary overhead for scanning regions of memory that don't contain any pointers. Also they can leak memory because some int value has the same value as a pointer and therefore the gc does not free that block of memory.


[...]
6) Different interface for the GC

The current interface to the GC would have to change because the
"this block of memory might contain a pointer" approach wouldn't
work anymore. For example a block of memory and a delegate which
iterates over all pointers within the memory block could be used for
user allocated memory blocks. There should be a separate allocator
function provided by the GC that allocates memory that does not get
moved around so it can be used to pass it to non garbage collected
code.

It would be nice if there was a way for GCs to be pluggable, especially
in the compiler. Currently, we can only swap GCs that implement the same
interface as the existing one, but to switch to a different GC model
like you're proposing would require a lot of compiler support.


7) Compiler Options

Each of the above mentioned groups of features should be exposed as
compiler options so that you can turn them on/off depending on which
type of GC you use. Default on/off states for these features are set
within a config file depending on which type of GC currently ships
per default with druntime.

This would be nice.


8) Conclusion

Garbage Collection brings a lot of advantages for the programmer
using the language but is not free and shouldn't be treated as free.

I don't know, but after reading this:

        http://www.cs.purdue.edu/homes/grr/snapshot-gc.ps

I think there might be a possibility of (almost) free GC.


There is no free GC. The only question is which trade offs you want to make. Modern implementations like the .NET 4.0 garbage collector show that all the things mentioned here are possible and are faster then primitve implementations.


Full support for garbage collection is required to build a fast and
efficient GC. This additional support requires additional features
within the compiler but should result in a overall better performing
language.

I agree in principle, although for specific GC proposals, we'd need to
evaluate the pros and cons to determine what is improved and what
degrades. Unfortunately, GC is a complex problem, and different GCs work
better with different apps. I wouldn't be so quick to make claims about
the performance of GCs. It depends on what the app does.


T


Fully agree on this

I want to add that I did not make all this up. Most of the mentioned features here are actually used in a Smalltalk implementation that compiles Smalltalk to C for faster execution.

Reply via email to