I should state up front, I'm not trying to enable transparent reference counting!
I'm thinking about how to make very effective patterns from C be typesafe. In C these things are often packed arrays of structs, with direct non-safe pointers to each-other, or merely non-typesafe integer array indicies. Today, if I want to code these same strategies in a typesafe manner, including the manually coded reference counting and lifetime tracking, I don't see a language which allows me the facilities to even do so in a typesafe manner. I hope that correction will make the rest of what I write here more clear.. On Sat, Jul 13, 2013 at 3:39 PM, Jonathan S. Shapiro <[email protected]>wrote: > I think the question of library allocation is subtle. Allocation *per se* > isn't > the problem. The problem is GC pressure, and short-lived objects generate > qualitatively different pressures than long-lived objects. > On the contrary. When I said GC pressure, I am specifically thinking of pressure on the tracing work -- aka the absolute number of (and efficiency scanning) traced pointers. So allocation (of GC traced items) *is* the exact problem I'm thinking about. This came out of the observation that in large performant C programs, substantial amounts of data are often managed with stack allocation, struct-arrays, reference-counting, and reuse-queues. If we have a typesafe way to handle these cases, perhaps the number of GC traced objects/handles could be kept small enough to make GC tracing a trivial task. > My sense is that there are four different kinds of allocation that are > performed in a library code: > I think your categories are a good breakdown. Here are some comments on each as they relate to my thinking above.... code samples in C#. > 1. Allocations that are reachable from the return value [young] > We already know it is trivial to perform a transformation which pushes the allocation of these objects out to the caller. We turn "byte[] read(File fp)" into "void read(File fp, byte[] buf)", as is already often done for *certain* apis. Two challenges are that (a) using the latter is less convenient than the former in non-performance centric code; (b) libraries are free to make the wrong choice and enforce this on us. For (a), I'm swirling around thoughts on making the language hide this type of detail behind some syntax sugar. To illustrate, this code... while (fp.hasData) { doSomething(fp.read()); } Would actually be compiled to something resembling.. while (fp.hasData) { byte[] __temp = new byte[4000]; fp.read(__temp); doSomething(__temp); } Much like the way stackless or C#async attempts to hide async I/O patterns behind a synchronous looking facade, the above tries to hide allocation-free libraries behind a GC heap facade. Now, if we extend the CLR "ref" argument decorator to be usable on any object and mean "a pointer to something which can not be copied during the call and thus has no lifetime requirements outside the call" -- if doSomething's argument is a ref, then the compiler-itself could turn the above code into: byte[4000] __temp; // stack allocated version while (fp.hasData) { fp.read(ref __temp); doSomething(ref __temp); } In other cases, we could manually code for the non-allocation verstion of the function calls, because we want explicit control and we know what we are doing with buffers. Enforcing (b) seems much harder. The best I have so far is my hope that through my slightly extended scope for "ref", and the above convenience mechanisms, that we would intentionally write our libraries to be more allocation free. > 2. Allocations for computational storage internal to the call [transient] > 4. Allocations that become reachable from the internal global state of the > library. [potentially long lived] > I am grouping these together because they seem like reflexive equivalents to each-other. If we return to our C-inspired roots, in that world it is substantially more common to copy data into the lifetime tracking context of the code. That copying has cost, but that cost is proportional to the call frequency and data-size... which may be better than GC's tracing cost, which is proportional to pointer-count, allocation-count, and program-execution-duration. I'm guessing you are going to look at this and see overlap with your focus on regions, and I think that is very true. If there is a disctinction, my thoughts are staying away from what I perceive as "magic" mechanims which allow "normal" objects and "normal" pointers to work for either heap-GC or region spaces. I'm thinking of a C#-like class/struct dichotomy which gives us more convenient mechanisms to manage typesafe collections of mutable structs which are not seen by the GC. That mechanism is made up of: "structs" - (aggregate value types) "packed arrays of structs" "refs" - pointer function arguments which are guaranteed not to be copied, and thus not to extend lifetime beyond the call (like C# struct refs, but usable for any type) "handles" - typesafe non-pointer methods of addressing elements of struct-arrays ... could be as simple as a typesafe array-of-structs index, with some optional checking to prevent access to stale handles This is a big half-baked, but are you following where I'm headed? > 3. Allocations that become reachable as a result of modification of some > argument, e.g. appending to a passed-in collection of some form [hard to > say] > I have not thought enough about this. This lives in the hand-waving of what my new "ref" means for pointers hanging off an object passed by "uncopyable ref".
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
