At long last!
The discussions about mixing pure and impure programming, and the problem of
the "missing syntactic position" pushed me to think much more carefully
about whether it makes sense to have both boxed and unboxed types in the
language. If boxed types can be captured semantically by some combination of
unboxed types, references, and supporting sugar, that might provide a
resolution, or at least suggest a path. C and C++ certainly seem to get
along without any notion of boxed types, so it seemed worth thinking about.
Due to various distractions it's taken a while, but I now think I have an
answer, and I want reactions and thoughts about it.
Just as background, the approach I was playing with would take an unboxed
type like List('a), construct a corresponding *unboxed* type #List('a), and
then created a "preferred" type alias:
typealias List('a) = ref #List('a)
This typealias is preferred in the sense that it is a syntactic sugaring
that the compiler recognizes for output purposes. The type "ref #blah(...)"
is externally reported as "blah(...)". Carrying this bit of sugar through,
The consructors of List('a) turn into wrappers for the corresponding
constructors of #List('a). So we can restate the original question as: if
this sugaring is carried through appropriately, is that sufficient to let us
remove boxed types from the core semantics of BitC?
The answer is tantalizingly close to "yes", but it doesn't seem to be quite
there - or at least, there are some corner cases that I haven't been able to
smooth out enough yet. All of them seem to center around issues of equality
and identity.
So first, note that in the traditional unboxed implementation, Nil('a) does
not occupy storage. There is a distinguished reference value denoting
Nil('a), but the implementation exploits the fact that Nil('a) has no
fields, and that a reference to Nil('a) therefore cannot be dereferenced. It
is necessary to ensure that the reference to Nil('a) cannot be confused with
any other reference, but this can be (and is) done in any of several ways.
But what about #Nil('a)? Well, that certainly *does* occupy storage, because
it is an instance of #List('a), which is an unboxed type having non-zero
size. So since /nil/ is now attempting to be syntactic sugar for
ref(#Nil('a)), it would appear that Nil('a) now occupies storage. There are
certainly cases where we can eliminate the instance by constant propagation,
but probably not all such instances. And this bit of syntactic sugar has
subtle implications.
Is dup(#Nil) =eq= Nil guaranteed to be boolean true? If so, is this
accomplished by a subtyping illusion or is it accomplished by imposing a
constraint on the behavior of dup() when applied to union values
corresponding to a nullary constructor?
And if the desired answer is "yes", then is deref(#Nil) =eq= Nil guaranteed
to be boolean true? I think the answer to this has to be "no", because if a
structure contains a field of union type, we may be trying here to capture
the address of a field, and we want to be able to use that later for address
inclusion testing. It doesn't matter whether the operation in question is
called deref() or AddressOf(). The point is that its return type certainly
cannot be ref(#List('a)) [which is to say: List('a)], because we cannot
simultaneously preserve the desired behavior of AddressOf and Eq on nil.
And finally, consider that deref(ref(someList)) returns an assignable
location, so deref(ref(#Nil)) is an assignable location. Which means it must
be backed by storage, and further means that two instances of ref(#Nil)
probably had better not be Eq.
And this last point, I think, makes it pretty clear that not all references
are lvalues in disguise.
Also, it makes it pretty clear that the missing "syntactic position" where
we might insert a mutable keyword in a boxed type is not just a syntactic
omission - it encapsulates a semantic issue as well.
So from this we can draw at least the following conclusions:
1. Boxed types cannot be entirely replaced by references to unboxed types.
There is a significant semantic distinction where unions having nullary
constructors exist, so the two are not equivalent.
2. In consequence, the missing syntactic position for "mutable" actually *
needs* to be missing, lest we get into the position where we are forced to
choose between a faithful implementation of Eq and a faithful implementation
of AddressOf for boxed types.
3. There does NOT appear to be any corresponding problem for boxed structs,
because structs never have nullary constructors.
4. DEREF() cannot sensibly be applied to a boxed union.
5. Though it is perfectly okay to *permit* fields on unions (i.e. fields
shared by all legs), it is not obviously okay to subsume structs as unions
that have only a single leg. That needs further thought.
6. Curiously, none of this appears to violate the intuition that a union
is a sum of structs - the individual struct legs can still sensibly be
wholly mutable or wholly constant.
We can now resume discussion of whether pure types, in some fashion, are
useful.
shap
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev