Re: A benchmark, mostly GC

Robert Jacques Mon, 12 Dec 2011 20:30:53 -0800

On Mon, 12 Dec 2011 10:44:27 -0500, Timon Gehr <[email protected]> wrote:

On 12/12/2011 04:17 PM, Robert Jacques wrote:

On Mon, 12 Dec 2011 01:06:14 -0500, Brad Anderson <[email protected]> wrote:

On Sun, Dec 11, 2011 at 10:55 PM, Robert Jacques <[email protected]>
wrote:

Second, being a systems language means that D can not implement a lot of

GC algorithms including copying, generational and the good concurrent
collectors.

What about being a systems language prevents generational? The page on
garbage collection on the D website says while it doesn't use a
generational GC it will some day and gives tips on what to avoid so you
don't fall victim to the behavior of a moving GC.


Regarding moving collectors.
D supports semi-precise collection. Therefore D can support some types
of moving collectors, i.e. compactors, which is what the website is
talking about. Copying collectors, on the other hand, require full
precision to work; you must be able to fully evacuate a region of
memory. D doesn't support full precision, for a couple of performance
(unions,


The compiler could insert small code fragments that track whether or not
an union contains a pointer.

the call stack,


What is the problem with the call stack? Can't the compiler just
generate reference offset information for all the function frames and
then the GC generates a backtrace to identify the references?

C/C++ interop)


There should be multiple options for GC. If C/C++ interop is
unimportant, a better GC that does not support it well is still handy.

and technical reasons (the inline
assembler).


inline assembler does not always move around GC heap references. I think
that in the cases it does, reference stores could be annotated manually.


Solving these problems is a little more complicated than this, but the 
existence of solutions wasn't the problem; performance of those solutions was.

Unions can be instrumented, but the instrumentation must be harmonized with the 
GC as pointer bit masks must be changed. And changeable bitmasks present their 
own overhead problem (i.e. a 1/32 or 1/64 memory overhead). Personally, I'm in 
favor of a precise heap, so I'd like this, but it does require a compiler, not 
runtime, change.

The call stack is more complicated because you have to view the function frame 
as a bunch of unions as space in the function frame is heavily reused to 
minimize cache effects and the change of a stack overflow. So, before every 
function call you'd have to flush the current reference offsets of the function 
frame. *opps* I was remembering why dual stacks doesn't work, and the same 
logic applies to function frames. So a dual stack approach would use one stack 
for references and one stack for values. However, pointers to 
unions/structs/classes on the dual stack don't work because to values and 
pointers in the objects are not together anymore. Similarly, pointers to unions 
on a precise/framed stack are troublesome because the the assignment wouldn't 
know where the meta-data was. Hmm... The GC could have a pointer bitmask for 
each stack, which the stack frame flushes prior to each function call.

C/C++ interop is never unimportant; at a minimum all OS calls work through 
C/C++ interop. So while it may not be use directly by most D programs, under 
the hood, the libraries we use all depend on it. We'd have to be able to 
annotate C functions with their marshaling requirements and change how we 
handle them based on the GC we're using. Things get even more icky when one 
considers registering D callbacks or C functions calling into D DLLs.

As for the inline assembler, we'd have to annotate changes to both the GC heap 
and the call stack, and since this would have to be done manually, it would 
become a source of nasty memory bugs.

Technically, the performance of certain fundamental operations is an 
implementation detail, but there's a certain expectation of 'system' languages 
to be lean and mean, either optionally or by default and as pointer tracking 
eliminates that option. Furthermore, pointer tracking interferes with language 
interop in rather nasty ways and may even necessitate changes to the ABI. But 
that doesn't mean a D dialect couldn't do a nice modern GC; indead the .net D 
compiler already did, and IIRC LDC/LLVM has a JIT backend.

Regarding generational collectors.
Both generational and concurrent collectors require that every pointer
assignment is known to the compiler, which then instruments the
assignment to flag mark bits, etc. For generational collectors, you need
this information to know which objects/memory pages to search for roots
into the young generation. Without this information, you have to search
the entire heap, i.e. do a full collection. Again, both performance and
technical reasons come into play here. Instrumentation represents a
performance cost, which even if it pays for itself, looks bad in
newsgroups posting. Indeed, concurrent collectors are mostly about
trading throughput for latency. So, like JAVA, you'd want to use version
statements to select your GC style, but you'd also have to make sure
your entire codebase was compiled with the same flags; with 3rd party
DLLs and objects, this can become non-trivial. From a technical
perspective, complete pointer assignment instrumentation is a
non-starter because the D compiler doesn't have complete access to all
the code; both C/C++ and assembler code can modify pointers and are not
subject to instrumentation.  Now, if we supported C/C++ through
marshaling, like JAVA and C# do, and made the assembler a bit more smart
or required manual pointer instrumentation of asm code, we could use
these types of collectors.

* Note that the above doesn't take into account the types of virtual
memory tricks C4 can do, which may open these algorithms up to D and
other system programming languages.


I think we'll definitely need a generational/concurrent collector
eventually.


Thread-local collectors are easy todo in D, much easier in fact than the 
majority of other languages, and have performance similar to the current set of 
generational/concurrent collectors. And generational collectors are mainly to 
account for the fact Java doesn't have structs. So while I do think D needs a 
modern collector, I don't think that a generational/concurrent collector is 
absolutely required.

Could some of the problems be worked around by having more
than one GC implementation in the same executable?


Yes and no. GCs that require instrumentation to function would require entirely 
separate codebases. So supporting two different GCs would require fat-objects 
of some kind (i.e. N different executables)

Re: A benchmark, mostly GC

Reply via email to