On Sat, Jul 13, 2013 at 10:38 PM, David Jeske <[email protected]> wrote:

> On Sat, Jul 13, 2013 at 3:39 PM, Jonathan S. Shapiro <[email protected]>wrote:
>
>> I think the question of library allocation is subtle. Allocation *per se* 
>> isn't
>> the problem. The problem is GC pressure, and short-lived objects generate
>> qualitatively different pressures than long-lived objects.
>>
>
> On the contrary. When I said GC pressure, I am specifically thinking of
> pressure on the tracing work -- aka the absolute number of (and efficiency
> scanning) traced pointers. So allocation (of GC traced items) *is* the
> exact problem I'm thinking about.
>

Nit and no.  First, the nit: the important metrics for GC performance are
the number of cache lines touched and (in a concurrent GC) the number of
objects contented (because of locking requirements). I agree that in most
GC's the number of object references is a good predictor of the number of
cache lines touched, but we shouldn't forget the lower-level view, because
we're having a discussion elsewhere about concurrent collectors.

I'm going to assume from this point forward that we are talking about some
form of copying collector. It's pretty well understood that the mark/sweep
collectors are a dead end on the evolutionary tree.

The consequence of this assumption is that allocation is not a good
predictor of tracing pressure. Tracing pressure is most directly a function
of *reachability* rather than a function of the number of allocations. More
precisely: allocations trigger collections, but the cost of collection is a
function of liveness. See? Isn't that a completely transparent and simple
thing to think about? :-)

Let's look at two concrete examples:

Concrete example 1: in the lexer I'm currently playing with (written in
C#), I have a class that represents an input position. I had a choice
between a struct and a class for this, and I chose a class. There is a
mutable reference m_loc containing an object reference to a LexLoc. The
LexLoc itself is treated as immutable. Within the lexer, the pattern is

   m_loc = m_loc.updateWith(s);  // s the string being consumed

Note that the returned reference is always a newly allocated object.

So this creates memory pressure because I'm constantly allocating new
LexLocs. That in turn triggers GC. On the other hand, the location
containing the start location of the token is the initial value of m_loc
about half the time (the other half there is leading white space), and the *
final* position of the token (I track both) is *invariably* the m_loc value
at the end of a given call to GetToken().

So it's a judgment call and a balancing exercise. At a guess, I'd say that
20% of the LexLocs I produce don't survive outside of the Tokenizer. In
exchange, I don't have to keep track of which cases are which, and I
suspect that the overall reliability of the system is improved. That is:
I'm intentionally trading GC pressure for robustness, which is a
consideration that is often neglected in these discussions.


Concrete example 2: In the same lexer, for various reasons, I've chosen to
implement hypothetically infinite character lookahead for use within the
tokenizer. The character lookahead lets me write tokenizers that can tell
the difference between 'a (a type variable) and 'a' (a character literal).
Turns out character IO is pretty slow, and more so when you use ucs2. But I
*really* don't want a push-back mechanism for that, both because of perf
cost and because of [de]allocation and GC pressure concerns. So in that
case I've implemented a large-ish input character buffer. Characters are
eaten from the front until a fill() is needed, at which point the residual
few characters are moved to the bottom and the remaining buffer is filled
from the underlying input stream. Which is much more efficient than trying
to keep track of push-back.


My point in giving these two examples is mostly in support of things David
says below. There are important use-cases for reuse of storage, and there
are also well-motivated use cases for exploiting the automated resource
management provided by GC. But I'd also note that *neither* of these
choices was motivated by concerns about the performance of the GC
subsystem. The first was motivated by robustness and the second by I/O
performance considerations.


This came out of the observation that in large performant C programs,
> substantial amounts of data are often managed with stack allocation,
> struct-arrays, reference-counting, and reuse-queues. If we have a typesafe
> way to handle these cases, perhaps the number of GC traced objects/handles
> could be kept small enough to make GC tracing a trivial task.
>

BitC provided a lot of what we need, and so do the better functional
language compilers. I tend to take it as given that we will do those
optimizations. But more importantly, we need to be able to express those
uses of data structures prescriptively. I made that argument in BitC and I
still believe it strongly.


>  1. Allocations that are reachable from the return value [young]
>
> We already know it is trivial to perform a transformation which pushes the
> allocation of these objects out to the caller. We turn "byte[] read(File
> fp)" into "void read(File fp, byte[] buf)", as is already often done for
> *certain* apis. Two challenges are that (a) using the latter is less
> convenient than the former in non-performance centric code; (b) libraries
> are free to make the wrong choice and enforce this on us.
>

I don't think the language should hide this. The particular example you
gave can be handled just fine with overloading. It's not obvious that one
way is better than the other, particularly when the programmer may be
making choices about functional vs stateful idioms for reasons other than
performance.


> Enforcing (b) seems much harder.
>

Again, we have to be careful about not focusing on a single metric of
"right", but tools to provide design idiom hints seem like a useful thing
to have here.


> 2. Allocations for computational storage internal to the call [transient]
>>
> 4. Allocations that become reachable from the internal global state of the
>> library. [potentially long lived]
>>
>
> I am grouping these together because they seem like reflexive equivalents
> to each-other. If we return to our C-inspired roots, in that world it is
> substantially more common to copy data into the lifetime tracking context
> of the code. That copying has cost, but that cost is proportional to the
> call frequency and data-size... which may be better than GC's tracing cost,
> which is proportional to pointer-count, allocation-count, and
> program-execution-duration.
>

The point of these two categories was to distinguish "stuff that copied
into lifetime context" from "stuff that is used locally and then
abandoned". The former, for example, isn't a candidate for stack allocation
(because it escapes). The latter *is* a candidate, and can often be
excluded from GC for that reason.

You talked later about regions. The analysis behind automatic stack
allocation - escape analysis - is closely related to region analysis.

All of the mechanisms you went on to describe are reasonable mechanisms to
have, but we need to be able to type them (notably handles) within the
language. BitC had them all, with the exception of handles, because we
anticipated that need would be subsumed by adding a nat kind.


> 3. Allocations that become reachable as a result of modification of some
>> argument, e.g. appending to a passed-in collection of some form [hard to
>> say]
>>
>
> I have not thought enough about this. This lives in the hand-waving of
> what my new "ref" means for pointers hanging off an object passed by
> "uncopyable ref".
>

Fair enough. The main reason it's interesting is because it wreaks havoc on
information flow analysis - both for regions and for security.


Jonathan
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to