On Sat, Jul 13, 2013 at 10:38 PM, David Jeske <[email protected]> wrote:
> On Sat, Jul 13, 2013 at 3:39 PM, Jonathan S. Shapiro <[email protected]>wrote: > >> I think the question of library allocation is subtle. Allocation *per se* >> isn't >> the problem. The problem is GC pressure, and short-lived objects generate >> qualitatively different pressures than long-lived objects. >> > > On the contrary. When I said GC pressure, I am specifically thinking of > pressure on the tracing work -- aka the absolute number of (and efficiency > scanning) traced pointers. So allocation (of GC traced items) *is* the > exact problem I'm thinking about. > Nit and no. First, the nit: the important metrics for GC performance are the number of cache lines touched and (in a concurrent GC) the number of objects contented (because of locking requirements). I agree that in most GC's the number of object references is a good predictor of the number of cache lines touched, but we shouldn't forget the lower-level view, because we're having a discussion elsewhere about concurrent collectors. I'm going to assume from this point forward that we are talking about some form of copying collector. It's pretty well understood that the mark/sweep collectors are a dead end on the evolutionary tree. The consequence of this assumption is that allocation is not a good predictor of tracing pressure. Tracing pressure is most directly a function of *reachability* rather than a function of the number of allocations. More precisely: allocations trigger collections, but the cost of collection is a function of liveness. See? Isn't that a completely transparent and simple thing to think about? :-) Let's look at two concrete examples: Concrete example 1: in the lexer I'm currently playing with (written in C#), I have a class that represents an input position. I had a choice between a struct and a class for this, and I chose a class. There is a mutable reference m_loc containing an object reference to a LexLoc. The LexLoc itself is treated as immutable. Within the lexer, the pattern is m_loc = m_loc.updateWith(s); // s the string being consumed Note that the returned reference is always a newly allocated object. So this creates memory pressure because I'm constantly allocating new LexLocs. That in turn triggers GC. On the other hand, the location containing the start location of the token is the initial value of m_loc about half the time (the other half there is leading white space), and the * final* position of the token (I track both) is *invariably* the m_loc value at the end of a given call to GetToken(). So it's a judgment call and a balancing exercise. At a guess, I'd say that 20% of the LexLocs I produce don't survive outside of the Tokenizer. In exchange, I don't have to keep track of which cases are which, and I suspect that the overall reliability of the system is improved. That is: I'm intentionally trading GC pressure for robustness, which is a consideration that is often neglected in these discussions. Concrete example 2: In the same lexer, for various reasons, I've chosen to implement hypothetically infinite character lookahead for use within the tokenizer. The character lookahead lets me write tokenizers that can tell the difference between 'a (a type variable) and 'a' (a character literal). Turns out character IO is pretty slow, and more so when you use ucs2. But I *really* don't want a push-back mechanism for that, both because of perf cost and because of [de]allocation and GC pressure concerns. So in that case I've implemented a large-ish input character buffer. Characters are eaten from the front until a fill() is needed, at which point the residual few characters are moved to the bottom and the remaining buffer is filled from the underlying input stream. Which is much more efficient than trying to keep track of push-back. My point in giving these two examples is mostly in support of things David says below. There are important use-cases for reuse of storage, and there are also well-motivated use cases for exploiting the automated resource management provided by GC. But I'd also note that *neither* of these choices was motivated by concerns about the performance of the GC subsystem. The first was motivated by robustness and the second by I/O performance considerations. This came out of the observation that in large performant C programs, > substantial amounts of data are often managed with stack allocation, > struct-arrays, reference-counting, and reuse-queues. If we have a typesafe > way to handle these cases, perhaps the number of GC traced objects/handles > could be kept small enough to make GC tracing a trivial task. > BitC provided a lot of what we need, and so do the better functional language compilers. I tend to take it as given that we will do those optimizations. But more importantly, we need to be able to express those uses of data structures prescriptively. I made that argument in BitC and I still believe it strongly. > 1. Allocations that are reachable from the return value [young] > > We already know it is trivial to perform a transformation which pushes the > allocation of these objects out to the caller. We turn "byte[] read(File > fp)" into "void read(File fp, byte[] buf)", as is already often done for > *certain* apis. Two challenges are that (a) using the latter is less > convenient than the former in non-performance centric code; (b) libraries > are free to make the wrong choice and enforce this on us. > I don't think the language should hide this. The particular example you gave can be handled just fine with overloading. It's not obvious that one way is better than the other, particularly when the programmer may be making choices about functional vs stateful idioms for reasons other than performance. > Enforcing (b) seems much harder. > Again, we have to be careful about not focusing on a single metric of "right", but tools to provide design idiom hints seem like a useful thing to have here. > 2. Allocations for computational storage internal to the call [transient] >> > 4. Allocations that become reachable from the internal global state of the >> library. [potentially long lived] >> > > I am grouping these together because they seem like reflexive equivalents > to each-other. If we return to our C-inspired roots, in that world it is > substantially more common to copy data into the lifetime tracking context > of the code. That copying has cost, but that cost is proportional to the > call frequency and data-size... which may be better than GC's tracing cost, > which is proportional to pointer-count, allocation-count, and > program-execution-duration. > The point of these two categories was to distinguish "stuff that copied into lifetime context" from "stuff that is used locally and then abandoned". The former, for example, isn't a candidate for stack allocation (because it escapes). The latter *is* a candidate, and can often be excluded from GC for that reason. You talked later about regions. The analysis behind automatic stack allocation - escape analysis - is closely related to region analysis. All of the mechanisms you went on to describe are reasonable mechanisms to have, but we need to be able to type them (notably handles) within the language. BitC had them all, with the exception of handles, because we anticipated that need would be subsumed by adding a nat kind. > 3. Allocations that become reachable as a result of modification of some >> argument, e.g. appending to a passed-in collection of some form [hard to >> say] >> > > I have not thought enough about this. This lives in the hand-waving of > what my new "ref" means for pointers hanging off an object passed by > "uncopyable ref". > Fair enough. The main reason it's interesting is because it wreaks havoc on information flow analysis - both for regions and for security. Jonathan
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
