In thinking about the lexing problem, I spent some time diverted on the idea that tokens would be assembled into a (linked) list, which would eventually be interned as a string. Reflection shows that this isn't the right way to handle matters, but thinking about it pointed me at a concurrency problem.
Assume, for a moment, that we want to intern strings from lists of characters or (equally good) from vectors of characters. In each case, the data structure is heap-based, and there exists the possibility (in abstract) that the reference to the data structure may be shared. In consequence, we must consider the possibility that the data structure is subject to concurrent mutation. In the presence of this hazard, how do we implement intern without performing a localizing copy? Answer: we can't. In fact, the absence of any ability to make statements about concurrency hazards is a major pain in the ass, and the best answer lies in being able to make statements about immutability. And note here that I truly mean "immutability", as distinct from the object integrity view that Mark Miller advocates. For intern, we most critically need to be able to get a stable notion of the length of the interned string. In the case of vectors that is inherent, but in the case of lists it relies on the "next" field being immutable. So how does this relate to the integrity view? If a field is private, and if the compiler has insight into the source code of all member functions, then we can successfully infer that no member function permits that field to be mutated. That is: we can infer immutability of a field from the source code of the "member" functions. Which is no surprise, since Swaroop's work establishes that mutability is inferrable. So what's the problem here? BitC doesn't have member functions! Even if we did, inferring the immutability of the field is a consequence of the *source code* of the member functions. Knowing their signatures is insufficient! So in the end, the notion that a field is immutable (or more precisely: init-only) actually *does* need to be part of the type system. Following a brief phone conversation with MarkM, he agrees, so we're set back to an earlier state. And here I was well on the way to removing field mutability! Crap! There is another nasty implication as well: we need to be very careful about inner references and references into the stack. Why? Even if a field is init-only, the containing structure as a whole may be mutable. So imagine that we have a CONS cell on the stack that becomes part of a list. I assume that the list has suitable lifetime restrictions to guarantee memory safety (e.g. via region analysis). So now we have a list containing a stack-allocated CONS cell, which means that even if the CAR and CDR fields are init-only, the CONS cell can be overwritten as a whole, and in consequence the length of the list as a whole is not derived from immutable state. This is not intrinsically good or bad, but it means that init-only fields do not provide a sufficient justification to declare that the output of a LENGTH() function is a pure function of its inputs. So that's the problem. I'll try in a reply to this note to speak to possible resolutions. No surprise here, in some sense: BitC is trying to reconcile a bunch of hard issues, so it shouldn't be a surprise that accomplishing the reconciliation is frustrating and difficult. shap
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
