On Wed, 29 Dec 2010 12:27:02 -0700, Steven Schveighoffer
<[email protected]> wrote:
On Wed, 29 Dec 2010 14:00:17 -0500, Robert Jacques <[email protected]>
wrote:
On Wed, 29 Dec 2010 07:37:10 -0700, Steven Schveighoffer
<[email protected]> wrote:
On Tue, 28 Dec 2010 01:23:22 -0500, Robert Jacques <[email protected]>
wrote:
Second, the false pointer problem disappears (for practical purposes)
when you move to 64-bit.
I'm not sure I like this "solution", but you are correct. This is
somewhat mitigated however by the way memory is allocated (I'm
assuming not sparsely throughout the address space, and also low in
the address space). It certainly makes it less likely that a 64-bit
random long points at data, but it's not inconceivable to have 32-bits
of 0 interspersed with non-zero data. It might be likely to have a
struct with two ints back to back, where one int is frequently 0.
Ah, but the GC can allocate ram in any section of the address space it
wants, so it would be easy for the upper 32-bits to be always non-zero
by design.
huh? How does the GC control whether you set one int to 0 and the other
not?
Hmm, perhaps an example would be best. Consider a 64-bit thread-local GC.
It might allocate ram in the following pattern:
[2-bit Shared/Local][16-bit thread ID][6-bit tag][40-bit pointer]
So first, the address space is divided up into 4 regions, [11...] and
[00...] are left free for user use (external allocation, memory-mapped
files, etc), and [01...]/[10...] are used to denote shared/thread-local.
Next you have the thread ID, which is one way to support thread-local
allocation/thread-local GCs.
Then you might have a 6-bit region that lock-free algorithms could use as
a tag (and would be ignored by the shared GC), or local GCs could use for
internal purposes.
Finally, you have a 40-bit region with what we normally think of as a
'pointer'.
The thing to understand, is that 64-bit computers are really 40-bit
computers, currently. And given that 40-bits will hold us until we get to
peta-bytes, we should have 24-bits to add meta-info to our pointers for
some time to come. So, as long as we choose this meta-info carefully, we
can avoid common bit patterns and the associated false pointers.
Third, modern GCs (i.e. thread-local GCs) can further reduce the
false pointer issue.
I'd rather have precise scanning :) There are issues with
thread-local GCs.
The only issue with thread-local GCs is that you can't cast to
immutable and then shared the result across threads. And eventually,
well have a) better ways of constructing immutable and b) a deep idup,
to mitigate this.
I'm talking about collection cycles -- they necessarily need to scan
both thread local and shared heaps because of the possibility of
cross-heap pointing. Which means you gain very little for thread-local
heaps. The only thing it buys you is you can assume the shared heap has
no pointers into local heaps, and local heaps have no pointers into
other local heaps. But if you can't run a collection on just one heap,
there is no point in separating them.
The defining feature of thread-local GCs is that they _can_ run collection
cycles independently from each other.
Plus it's easy to cast without moving the data, which is not undefined
currently if you take the necessary precautions, but would cause large
problems with separate heaps.
Casting from immutable/shared would be memory valid, it's casting to
immutable and shared where movement has to occur. As casting between
immutable/shared and mutable is logically invalid, I'd expect that these
cases would be rare (once we solve the problem of safely constructing
complex immutable data)
If we have the typeinfo of a memory block for the GC to parse, you can
also rule out cross-thread pointers without thread-local GCs (for
unshared data).
Which basically creates all the problems of thread-local GC, with very
few of the advantages.
What are the advantages? I'm not being sarcastic, I really don't know.
-Steve
The major advantage is that they match or out-perform the modern
generational/concurrent GCs, even when backed by conservative mark-sweep
collectors (according to Apple). (TLGCs can be backed by modern
collectors) Essentially, TLGCs work by separately allocating and
collecting objects which are not-shared between threads. Since these
collectors don't have to be thread safe, they can be more efficient in
their implementation, and collections only have to deal with their subset
of the heap and their own stack. This reduces pause times and false
pointers, etc. TLGCs are inherently parallel and have interleaved pause
times, which can greatly reduce the "embarrassing pause" effect (i.e. your
worker thread running out of ram won't pause your GUI, or stop other
workers from fulfilling http requests).
In D, they become even more important, since to the best of my knowledge D
can never support generational GCs and only supports concurrent GCs on
certain OSes. So, if D wants a competitive GC, a thread-local GC is the
only cross-platform option I know of. (C function calls prevent most
generational/concurrent GC algorithms from being applicable to D)