Ivan Volosyuk wrote:
On 11/9/06, Etienne Gagnon <[EMAIL PROTECTED]> wrote:
Ivan Volosyuk wrote:
> We will get rid of false sharing. That's true. But it still be quite
> expensive to write those '1' values, because of ping-ponging of the
> cache line between processors. I see only one solution to this: use
> separate mark bits in vtable per GC thread which should reside in
> different cache lines and different from that word containing gcmap
> pointer.

The only thing that a GC thread does is write "1" in this slot; it never
writes "0".  So, it is not very important in what order (or even "when")
this word is finally commited to main memory.  As long as there is some
barrier before the "end of epoch collection" insuring that all
processors cache write buffers are commited to memory before tracing
vtables (or gc maps).

You don't need memory coherency on write-without-read. :-)

I don't speak about memory coherency, I speak about bus load with
useless memory traffic between processors and poor CPU cache usage.

Surely this wouldn't happen in a sufficiently weak memory model ? Lets just not support x64 :-)

But I think this false sharing may be what kills this particular idea.
The next cheapest option should be to use a side array of bytes - as long as calculating the address of the mark byte can be done without any loads or register spills, it should still be cheaper than a full test-and-mark operation on the vtable. I guess there are cache policies where this may still be slow on an SMP machine.

Side metadata is easiest to do when objects are in a specific space, and has coarse alignment. Any ideas what the typical size of a DRLVM vtable is ? Would 256 bytes be an excessive alignment boundary ?

I'll try it out in the next day or so. Unfortunately I don't have access to anything with more parallelism than a Pentium D, so it's not likely to be conclusive.

--
Robin Garner
Dept. of Computer Science
Australian National University
http://cs.anu.edu.au/people/Robin.Garner/

Reply via email to