Destructors will almost certainly be run on the background thread. Although those run at shutdown will likely be on the main thread.

I hope the overall performance results are good. I really need to see the timing of the collections to know. But in any case it does indicate that some work on removing lock contention is worthwhile.

On 13/05/2014 5:04 PM, Rainer Schuetze via Digitalmars-d wrote:

This sounds pretty cool.

On 13.05.2014 06:42, Adam Sakareassen via Digitalmars-d wrote:
Hi all,

[...]
All memory allocations are entirely lock free (using CAS instructions).
  So a pre-empted thread will never block another.   For allocations of
less than 128 bytes, each thread is allocated memory from it's own
memory pool to avoid false sharing on the CPU's cache.

There was a GSoC project a few years ago which was planning to do
something similar, but then switched to go for implementing precise
scanning. I always wanted to add lock-free memory allocations, too.

[...]

The mark phase needs to stop the world.  The sweeping portion of the
collection will run in the background.   This is similar to the current
implementation as the world is restarted after the mark phase, however
the thread doing the collection will not allocate the requested memory
to the calling thread until after the sweep has completed.  This means
that single threaded applications always wait for the full garbage
collection cycle.

Does this mean that destructors/finalizers are called from a different
thread? It was never guaranteed to be run on the same thread as the
allocation of the object so far, but effectively that happens for
single-threaded applications. It could help to continue doing that, some
resources have to be released from the same thread they have been
acquired (e.g. UI handles). For multi-threaded applications, I'm fine
with "guaranteeing" that the destructor is never run on the same thread,
though some standard mechanism should exist to forward to another thread.


So far allocation speed seems to have improved.  I can't test collection
speed as it's not complete.  As a test I wrote a simple function that
allocates a linked list of 2 million items.  This function is then
spawned by 20 threads.  This test script is shown below.  Timing for
allocation (with GC disabled) is as follows.  (Using DMD 2.065)

Existing GC code:  15700ms (average)
My GC code:   500ms (Average)

Very promising numbers :-)


When performing the same amount of allocations on a single thread, the
new code is still slightly faster than the old.

What this demonstrates is that the locking mechanisms in the current GC
code is a huge overhead for multi threaded applications that perform a
lot of memory allocations.  (ie.  Use the “new” operator or dynamic
arrays.)

It would be nice to see the default GC and memory allocator improved.
There is certainly room for improvement on the allocator end which may
mask some of the performance issues associated with garbage collection.

In the future I think D needs to look at making collection precise.  It
would not be too hard to adjust the mark and sweep GC to be nearly
precise.  The language needs to support precise GC before things like
moving garbage collection become feasible.

You might check how compatible your implementation is with this:
https://github.com/rainers/druntime/tree/gcx_precise2


Reply via email to