I should state up front, I'm not trying to enable transparent reference
counting!

I'm thinking about how to make very effective patterns from C be typesafe.

In C these things are often packed arrays of structs, with direct non-safe
pointers to each-other, or merely non-typesafe integer array indicies.
Today, if I want to code these same strategies in a typesafe manner,
including the manually coded reference counting and lifetime tracking, I
don't see a language which allows me the facilities to even do so in a
typesafe manner.

I hope that correction will make the rest of what I write here more clear..

On Sat, Jul 13, 2013 at 3:39 PM, Jonathan S. Shapiro <[email protected]>wrote:

> I think the question of library allocation is subtle. Allocation *per se* 
> isn't
> the problem. The problem is GC pressure, and short-lived objects generate
> qualitatively different pressures than long-lived objects.
>

On the contrary. When I said GC pressure, I am specifically thinking of
pressure on the tracing work -- aka the absolute number of (and efficiency
scanning) traced pointers. So allocation (of GC traced items) *is* the
exact problem I'm thinking about.

This came out of the observation that in large performant C programs,
substantial amounts of data are often managed with stack allocation,
struct-arrays, reference-counting, and reuse-queues. If we have a typesafe
way to handle these cases, perhaps the number of GC traced objects/handles
could be kept small enough to make GC tracing a trivial task.


> My sense is that there are four different kinds of allocation that are
> performed in a library code:
>

I think your categories are a good breakdown. Here are some comments on
each as they relate to my thinking above.... code samples in C#.


> 1. Allocations that are reachable from the return value [young]
>

We already know it is trivial to perform a transformation which pushes the
allocation of these objects out to the caller. We turn "byte[] read(File
fp)" into "void read(File fp, byte[] buf)", as is already often done for
*certain* apis. Two challenges are that (a) using the latter is less
convenient than the former in non-performance centric code; (b) libraries
are free to make the wrong choice and enforce this on us.

For (a), I'm swirling around thoughts on making the language hide this type
of detail behind some syntax sugar. To illustrate, this code...

  while (fp.hasData) {
     doSomething(fp.read());
  }

Would actually be compiled to something resembling..

  while (fp.hasData) {
     byte[] __temp = new byte[4000];
     fp.read(__temp);
     doSomething(__temp);
  }

Much like the way stackless or C#async attempts to hide async I/O patterns
behind a synchronous looking facade, the above tries to hide
allocation-free libraries behind a GC heap facade. Now, if we extend the
CLR "ref" argument decorator to be usable on any object and mean "a pointer
to something which can not be copied during the call and thus has no
lifetime requirements outside the call" -- if doSomething's argument is a
ref, then the compiler-itself could turn the above code into:

  byte[4000] __temp;  // stack allocated version
  while (fp.hasData) {
     fp.read(ref __temp);
     doSomething(ref __temp);
  }

In other cases, we could manually code for the non-allocation verstion of
the function calls, because we want explicit control and we know what we
are doing with buffers.

Enforcing (b) seems much harder. The best I have so far is my hope that
through my slightly extended scope for "ref", and the above convenience
mechanisms, that we would intentionally write our libraries to be more
allocation free.


> 2. Allocations for computational storage internal to the call [transient]
>
4. Allocations that become reachable from the internal global state of the
> library. [potentially long lived]
>

I am grouping these together because they seem like reflexive equivalents
to each-other. If we return to our C-inspired roots, in that world it is
substantially more common to copy data into the lifetime tracking context
of the code. That copying has cost, but that cost is proportional to the
call frequency and data-size... which may be better than GC's tracing cost,
which is proportional to pointer-count, allocation-count, and
program-execution-duration.

I'm guessing you are going to look at this and see overlap with your focus
on regions, and I think that is very true. If there is a disctinction, my
thoughts are staying away from what I perceive as "magic" mechanims which
allow "normal" objects and "normal" pointers to work for either heap-GC or
region spaces. I'm thinking of a C#-like class/struct dichotomy which gives
us more convenient mechanisms to manage typesafe collections of mutable
structs which are not seen by the GC. That mechanism is made up of:

"structs" - (aggregate value types)
"packed arrays of structs"

"refs" - pointer function arguments which are guaranteed not to be copied,
and thus not to extend lifetime beyond the call (like C# struct refs, but
usable for any type)

"handles" - typesafe non-pointer methods of addressing elements of
struct-arrays ... could be as simple as a typesafe array-of-structs index,
with some optional checking to prevent access to stale handles

This is a big half-baked, but are you following where I'm headed?


> 3. Allocations that become reachable as a result of modification of some
> argument, e.g. appending to a passed-in collection of some form [hard to
> say]
>

I have not thought enough about this. This lives in the hand-waving of what
my new "ref" means for pointers hanging off an object passed by "uncopyable
ref".
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to