On 11.06.2014 18:59, Dmitry Olshansky wrote:
03-Jun-2014 11:35, Rainer Schuetze пишет:
Hi,
more GC talk: the last couple of days, I've been experimenting with
implementing a concurrent GC on Windows inspired by Leandros CDGC.
Here's a report on my experiments:
http://rainers.github.io/visuald/druntime/concurrentgc.html
tl;dr: there is a working(?) partially concurrent GC here:
https://github.com/rainers/druntime/tree/concurrent_gc2
but it eats a whole lot of memory.
I'm not sure if how it would perform but what about the following idea
for COW snapshotting.
1. Don't use the separate process and shared view, use background thread.
2. Allocate heap space with MapViewOfFile, but private not shared.
During collection, creating a snapshot (during stop the world) as:
1. Make a new, read-write view of heap with MapViewOfFile, this is what
background thread will use to scan memory.
2. Save base address of the original view, and unmap it.
3. MapViewOfFileEx with MAP_COPY and the old base address (that gives us
back the same heap but in CoW mode).
.... move on with application and let background thread do the marking.
Once marking is done, do stop the world and:
1. VirtualQuery (or QueryWorkingSetEx) over the application's CoW view
of the heap. See the ranges of pages that have flipped to
PAGE_READWRITE, those are the one modified and duplicated. If new
allocation are served from it then it will include newly allocated stuff.
2. Copy these ranges of pages over to the normal read-write view.
3. Close CoW mapping (thereby freeing duplicates) and remap it again as
r-w view on the same address with MapViewOfFileEx.
I wasn't able to get QueryWorkingSetEx to behave but I believe it must
be a better way then VirtualQuery every page-range to death.
See the sketch of the idea here :
https://gist.github.com/DmitryOlshansky/5e32057e047425480f0e
Cool stuff! I remember trying something similar, but IIRC forcing the
same address with MapViewOfFile somehow failed (maybe this was across
processes). I tried your version on both Win32 and Win64 successfully,
though.
I implemented the QueryWorkingSetEx version like this (you need a
converted psapi.lib for Win32):
enum PAGES = 512; //SIZE / 4096;
PSAPI_WORKING_SET_EX_INFORMATION[PAGES] info;
foreach(i, ref inf; info)
inf.VirtualAddress = heap + i * 4096;
if (!QueryWorkingSetEx(GetCurrentProcess(), info.ptr, info.sizeof))
throw new Exception(format("Could not query info (%d).\n",
GetLastError()));
foreach(i, ref inf; info)
writefln("flags page %d: %x", i, inf.VirtualAttributes);
and you can check the "shared" field to get copied pages. This function
is not supported on XP, though.
A short benchmark shows that VirtualQuery needs 55/42 ms for your test
on Win32/Win64 on my mobile i7, while QueryWorkingSetEx takes about 17
ms for both.
If I add the actual copy into heap2 (i.e. every fourth page of 512 MB is
copied), I get 80-90 ms more.
The numbers are not great, but I guess the usual memory usage and number
of modified pages will be much lower. I'll see if I can integrate this
into the concurrent implementation.