On Sun, 28 Mar 2010 09:40:10 -0300, bearophile <[email protected]>
wrote:
This is just a small invention.
Enums can decrease the precision of the GC, because fields can be
pointers or values, an example:
enum NodeType { value, sum, product }
struct Node {
NodeType type;
union {
double val;
struct {
Node* left, right;
}
}
}
(Node can be used to build a simple expression tree, that can contain
things like: 2.5 * 3.0 + 2.0).
In this case the GC has to use the safe strategy of always following the
left and right pointers, even when they are a double.
At runtime the program usually knows what's inside the enum, if the enum
truly contains pointers. This information can be inside a tag as in this
example, inside a tagged pointer that points to the struct/enum, or at
worst in the code that uses the struct/enum. But generally the compiler
is not able to use this information, because a compiler usually doesn't
understand the program semantics.
So I can think of a new optional standard class/struct/enum method, for
example gcfollow() that the garbage collector recognizes when it is
present and useful. It gets called with the attribute name and returns
true at runtime if the the GC has to follow a pointer/reference:
struct Node {
NodeType type;
union {
double val;
struct {
Node* left, right;
}
}
bool gcfollow(string field)() {
return type != NodeType.value;
}
}
Note that the GC will call gcfollow() (here a template for more
efficiency) only with the strings of "left" or "right", because "type"
and "val" attributes can't contain pointers/references.
So this gcfollow() will be instantiated only with "left" or "right", and
in both cases it returns true if the node isn't a value, so the GC will
not mistake the "val" double in the leaves of this expression tree as a
couple of pointers. The GC will follow the left and right pointers only
for the internal nodes of the tree. This slows down the GC a little but
increases its precision (if gcfollow() has no bugs).
If gcfollow() is not present the GC acts as it does now, assuming the
pointers are always present.
If a struct contains no pointers/references the gcfollow() is never
used. If a struct contains another struct, the GC will call gcfollow()
of the inner struct too if it contains pointers/references, etc.
Bye,
bearophile
This would require structs/arrays/etc to contain a header with a vtable,
so the GC could know what to do. You'd probably save memory (the headers
cost 16 bytes) and would definitely save collection time simply breaking
those unions into things with pointers and things without pointers. In
your example, doing that would cost some extra memory, as you'd go from a
16-byte block to a 32-byte block. However, remember, the GC allocates on
16-byte boundaries so each Node* actually has 4-bits (8 total) in which to
hide an enum.