In thinking about the lexing problem, I spent some time diverted on the idea
that tokens would be assembled into a (linked) list, which would eventually
be interned as a string. Reflection shows that this isn't the right way to
handle matters, but thinking about it pointed me at a concurrency problem.

Assume, for a moment, that we want to intern strings from lists of
characters or (equally good) from vectors of characters. In each case, the
data structure is heap-based, and there exists the possibility (in abstract)
that the reference to the data structure may be shared. In consequence, we
must consider the possibility that the data structure is subject to
concurrent mutation. In the presence of this hazard, how do we implement
intern without performing a localizing copy?

Answer: we can't. In fact, the absence of any ability to make statements
about concurrency hazards is a major pain in the ass, and the best answer
lies in being able to make statements about immutability.

And note here that I truly mean "immutability", as distinct from the object
integrity view that Mark Miller advocates. For intern, we most critically
need to be able to get a stable notion of the length of the interned string.
In the case of vectors that is inherent, but in the case of lists it relies
on the "next" field being immutable.

So how does this relate to the integrity view?

If a field is private, and if the compiler has insight into the source code
of all member functions, then we can successfully infer that no member
function permits that field to be mutated. That is: we can infer
immutability of a field from the source code of the "member" functions.
Which is no surprise, since Swaroop's work establishes that mutability is
inferrable.

So what's the problem here?

BitC doesn't have member functions! Even if we did, inferring the
immutability of the field is a consequence of the *source code* of the
member functions. Knowing their signatures is insufficient! So in the end,
the notion that a field is immutable (or more precisely: init-only) actually
*does* need to be part of the type system. Following a brief phone
conversation with MarkM, he agrees, so we're set back to an earlier state.

And here I was well on the way to removing field mutability! Crap!

There is another nasty implication as well: we need to be very careful about
inner references and references into the stack.

Why?

Even if a field is init-only, the containing structure as a whole may be
mutable. So imagine that we have a CONS cell on the stack that becomes part
of a list. I assume that the list has suitable lifetime restrictions to
guarantee memory safety (e.g. via region analysis).

So now we have a list containing a stack-allocated CONS cell, which means
that even if the CAR and CDR fields are init-only, the CONS cell can be
overwritten as a whole, and in consequence the length of the list as a whole
is not derived from immutable state.

This is not intrinsically good or bad, but it means that init-only fields do
not provide a sufficient justification to declare that the output of a
LENGTH() function is a pure function of its inputs.

So that's the problem. I'll try in a reply to this note to speak to possible
resolutions.

No surprise here, in some sense: BitC is trying to reconcile a bunch of hard
issues, so it shouldn't be a surprise that accomplishing the reconciliation
is frustrating and difficult.

shap
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to