On Tue, Jul 16, 2013 at 12:07 AM, Bennie Kloosteman <[email protected]>wrote:
> I dont think the fact its not in the kernel is a big reason either ... > worst case you could write a kernel module or driver and pass it the data > to do the same thing ... less elegant but would work . > Not so. As was explained elsewhere, this is about doing very high speed TLB and page table entry invalidation. The problem that the Azul kernel changes were intended to solve was the need for a kernel fast path to do this. By fast path, think "O(100) asm instructions". Calling a kernel module is several orders of magnitude too slow. > Jonothan Singularity had like 8 different types of GCs did they learn > anything about switching collectors ? > So first, I had no association with Singularity other than reading what has been published. And of course I couldn't respond to your question directly if I did. But I can say what I do know from general, public sources and non-proprietary conversations. Note that all of what I'm about to say predates the invention of continuous concurrent collectors. Copying collection is best viewed as an optimization on mark-sweep. It is an optimization for the case where allocations exceed liveness, having a side benefit of induced cache locality in some cases. Two-space collectors and generational collectors are best viewed as an optimization on copying collectors, which take advantage of general properties of object lifespans. But the key words here are "optimization" and "general". As the saying goes: "The difference between theory and practice is that in theory there is no difference between theory and practice, but in practice there is." So *no* real application exactly matches any particular set of optimization assumptions. Worse, applications go through "regimes" in which their behavior changes modally. For example, a lot of stuff is created during app initialization that is retained for a long time; the generational intuition doesn't tend to kick in until the application reaches steady state. So of course there are pathological cases. The pathological case for malloc/free, by the way, is an application that throws heap data away nearly as fast as it allocates it. But the other thing to say is that barring a new result from continuous concurrent collection, there exist no single collector design today that fits all application scenarios. This, rather than performance, is the soft underbelly of the GC argument, and David Jeske's objections are obliquely trying to point this out. There are also issues in trading *physical* RAM for performance, which David has correctly identified. His numbers are stale. The currently relevant multiplier is 3x RAM rather than 10x RAM, but his fundamental point remains valid for pre-C4 collectors. In any case, the two major collectors I know about for Singularity and Midori are the generic CLR collector and the STOPLESS work (and successors) that Bjarne Steensgard did. The CLR collector is a traditional generational collector; I don't know what, if any, specialized support was added for Singularity or Midori. Bjarne's work has been reasonably well described in publications. I can't comment on what Midori or Singularity did with their collectors, but I can point out a problem that arises in the presence of shared memory when different processes use different collectors: There tends to be an intimate relationship between the choice of collector and the design of the in-heap object header. Different collectors require different kinds of markers or interlocks on the objects, and the object header layout changes accordingly. When two different processes share memory, they have to agree on the semantics of the object headers well enough to cooperate. With distinct collectors running in the two processes, this is *very* hard to achieve. One of the reasons that the Singularity "shared heap" had to be referenced by linear-typed references is that it eliminates the need for GC to visit the shared heap. Rust's counted pointers could be used instead, and in some respects would be better. Rust's "owned" pointer is *not* a substitute, because ownership can't be transferred. That's the main difference between an owned pointer and a linear pointer. Dealing with two heaps in a single process is tricky. Dealing with a heap that is shared across process boundaries is even trickier. That said who is to say that the Azul collector is not 50% slower.,.. there > is very little information about this.. > Actually, there is quite a lot about this. The key statement is that they have measurements for heaps as big as 300GB indicating that *in principle* the mutator cannot outpace the collector. What I do not remember is what percentage of total multicore CPU capacity and what percentage of total memory bandwidth is required to achieve that. I do remember that it was surprisingly modest. Im sure you can build some syncronised GC with safe spots with minimal > pauses but can you do it and not trash cache hits and generating lots of > context switches.. > No. But that's not how these newer collectors work. The newer collectors * rely* on multicore for their success. Fortunately, multicore seems to be the way of the future whether we want it or not. So we now seem to be in a design space where two types of collectors remain relevant: 1. Ones where total RAM is small enough for simple, conventional collection to make sense. 2. Ones where multicore collectors make sense. > Re concurrent GCs yes i prefer to use the term pausless. > Pausless really isn't the objective in concurrent collectors. Generational collectors are effectively pauseless for the majority of applications. I'm aware that there are exceptions, but the term "pauseless" has been effectively co-opted in the literature to exclude those cases. Better to use a new term. The goal for the concurrent collectors is to be "stopless". That is: the mutator is *never* halted, or at least, the worst case halt is measured in microseconds and applies only to a single mutating thread. One of the big problems in single-core concurrent collectors is the need to synchronize all of the mutators on the collector. *That* turns out to be the big source of delay in many designs. That's why the STOPLESS type of design matters. > I also think the art of memory management is still usefull with GC .. but > little used , where in C it is often used .. if you test it and create too > many objects reduce it .. eg if you have a non array linked list reuse your > nodes , if you have vertex buffers or buffer[] reuse them . > I agree completely that the art of memory management remains important in a world of GC. But "reuse" is not what people generally mean when they talk about memory management. Reuse is an idiom; memory management is a mechanism. Jonathan
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
