Hi Sean --
Reading this response (which I only got around to today, sorry), I felt
like it was a (clear) restatement of your proposal, but one that left most
of the key questions in my mind and mail unanswered. I'm convinced of the
value of variants, and understand that they can subsume enums, but that
don't feel convinced that they ought to be.
To that end, let me iterate back through some of the open questions from
my previous mail (reordering and rephrasing slightly to work in this
context), and I'll attempt to summarize what I think your position is as
best I can, along with what the outstanding questions continue to be for
me.
(Terminologically, I'm going to use 'union' and 'enum' below as shorthand
for the two historical types of variants that we're considering
supporting. This isn't meant to imply that I'm hoping to preserve "union"
in any way (keyword or concept), just a shorthand term for getting a
particular use-case across.)
Q: Whether one or two concepts, what specific syntax should be used
for sum types?
- keyword(s)
- solve commas (list) vs. semicolons (fields)
- other...
(This one's actually the least of my concerns, but it's an obvious
starting point).
A: You proposed 'variant' for the keyword, which I like pretty well
(definitely much better than my lame 'overlap record' placeholder).
No deep objections there.
You seem to be proposing commas for the options in the variant,
though I remain unconvinced that this is natural (in that commas
don't usually suggest "choose one of these" in Chapel. Of course,
neither do semicolons -- I think we just followed C's union's lead
here, arguably. Perhaps we should use something new here like '|')?
In the "other..." category, syntactically, you're still proposing the
composite type concept, but let me get to that when we get to the
questions related to it...
Q: Are there practical, compelling use cases of this kind of mixing
(enum-ish thing and union-ish thing in one concept rather than two)
or is this strictly academic?
I believe you're endorsing one concept for both cases, but I still don't
feel like I've seen a practical/compelling use case that wants a
combination of enum-style and union-style variants. It seems to me that
the examples we've seen so far fall into one of these cases:
1) strictly enum-ish (e.g., list three colors); or
2) strictly union-ish (i.e., only one of these types is active); or
3) a union with one "boh" field (an Italian term for, essentially, "don't
ask me, I don't know"), which feels to me like it could just be
considered a union where one of the options is of void type (so this
lands back into case 2 for me).
4) artificial (e.g., the "other color" example I made up, which doesn't
actually come from any practical experience that leads me to want
this, but was just trying to create something).
Specifically, what I think I'm missing is a compelling case that has
multiple enum-style cases and one or more cases with specified types.
Without that (and maybe even with it), I'd be inclined to continue to
support two separate concepts, an 'enum' for case 1 and a 'variant' for
cases 2 and 3.
Q: Should enums and sum types be one language concept or two?
Assuming I'm correct that you're proposing a single type for both cases,
then I think the following questions remain largely unanswered:
- If one...
Q: How do we resolve backwards compatibility with the current enum
concept? Or do all programs that use enums just break?
Q: Do all of the use cases [for how we use enums today,
listed in the previous mail, listed below in this response] make
sense for sum types as well? (I think an answer to this
question needs to, essentially, iterate through the list of
use cases and say how they'd work for the more general variant
case).
Q: Are we in any danger of making C-style enums heavier weight
than they need to be?
(To this one, I think you implied that you don't think there's
a danger to making C-style enums heavier-weight, which I can
believe).
My hypothesis (though I haven't worked through the exercise) is that the
current uses of enums won't all translate to more general variants
cleanly, which leads me to think that we should support two concepts --
the enum giving backwards compatibility, supporting all current use cases
of enums, and providing a familiar concept for C programmers (including
the interoperability benefits); and the variant giving the more general
benefits of variants, yet without the burden of subsuming current enum use
cases. But I'm open to being proven wrong (I'm just not going to be the
one to do it :).
Q: What is it (if anything) about sum types that wants anonymous records
more than other contexts?
Q: If we took the "single field/type" approach illustrated at the top of
this section and then added an anonymous record concept, would that
give us everything we wanted?
Q: Are tuple types insufficient for cases like these? If so, why?
These questions remain unanswered to me. You proposed a composite type,
but without rationalizing it well enough for me to understand why we would
want it (Chapel's plenty big without adding yet another kind of type).
Specifically, it's not clear to me how a composite differs from a tuple
type or anonymous record, so if I could get away with those, it'd be one
fewer new language concept we'd need. If it's most similar to an
anonymous record, it's not clear to me what would be lost (semantically; I
understand the convenience benefits) if you were required to create a
record and then name that record type rather than use a composite. If
composites are different from an anonymous record, it's not clear to me
why, nor why you wouldn't want to use them in other cases ("I'd like to
create an array of composites"). And I continue not to see how the two
discussions (variants and composites/anonymous records) are related, other
than that it would make Chapel more Rust-like.
Left to my own devices, I'd start with the proposition that each option in
a variant can only have a single type, require users in the short-term to
handle composite cases via record types, and get that accepted and
implemented. Then, independently (maybe in parallel), I'd make a general
case for supporting anonymous records or composites in the language. Or
does something compell us to tackle both at once in the same discussion?
(which I think has the downside of muddying the water and raising the
amount of activation energy required to get either adopted).
Q: Is 'select' the right keyword for determining which field of a sum type
is active given that it doesn't have this meaning in any other context?
(i.e., such cases can't be re-written as a chained conditional and is
more like a meta-programming/pattern-matching construct). Should some
other keyword be used to distinguish this case instead? Could this
keyword be unified with whatever syntax is used to declare a sum type
itself? (e.g., 'variant select'?)
I think you asserted that 'select' was natural here, but without arguing
that it matches current interpretations of 'select' (as being equivalent
to a chained conditional).
Specifically, you wrote:
> select x {
> when Num{value} do ...
> when Error{msg, pos} do ...
> }
but this seems pretty different from current selects, in that it isn't
equivalent to:
if (x == Num{value} {
} else if (x == Error{msg, pos}) {
}
but is more of a pattern match/"open up my variant safely please" concept.
That's what the question above was trying to get at -- is it different
enough that it warrants some other concept to open it up?
Then there were a bunch of other questions on select and generic variants
which remain open, but those are currently second-order concerns for me,
so I won't bother re-listing them here -- we can come back to them if we
work through the above...
Re-list of the current uses of enum just below.
-Brad
How Chapel codes use enums today (to the best of Brad's knowledge):
* value comparisons in conditions and when clauses (probably obvious)
* ability to convert to/from ints in lightweight manner (coercion/cast)
* ability to cast enums to/from strings
* ability to do symbolic I/O with console/files (unlike C) -- i.e.,
writeln(myColor); will result in 'green' or 'blue' rather than '4'
or '2'. Ditto on input.
* ability to create domains (index sets) over set of possible values
and arrays over those domains
* support for 'config var's and 'config const's of enum type which
support specifying the values via strings.
* support for 'param's and 'config param's of enum type (suggests support
for compile-time reasoning about values, similar to param integers)
* ability to create sync vars of enum type because they're known to
be singleton, built-in values (sync vars of records and other
types that have multiple components are in the process of being
deprecated)
* automatic downcast semantics similar to param values (i.e., passing
an enum whose values all fit within int(8) to a routine expecting
int(8) even though enum might be considered default int in size by
default)
* ability to iterate over all values (undocumented, but used in the
implementation I believe -- and seems helpful to provide to users)
* ability to define secondary methods on enums
* (proposed, but not implemented) ability to specify an int size that
should be used to represent the enum (would be equivalent to a
param of that int size for dispatch purposes; var of that size for
execution time purposes)
* (proposed, but not implemented) ability to define 'extern enum's
in order to interoperate with enums from C within Chapel;
symmetrically,
one might want to export Chapel enums to C.
* (proposed, but not implemented) ability to 'use' enum to inject its
symbols into current scope. This would allow one to do:
use Color;
...red...
rather than:
...Color.red...
On Thu, 29 Jan 2015, Sean Billig wrote:
> Hi Brad,
>
> Thanks for taking the time to write this all up to keep me in the loop on
> the internal discussions.
>
> Like you mentioned, the motivation here is to fix/improve "union", and to
> introduce a possible way of dealing with error cases. I think we can
> accomplish both by replacing the current "union" construct completely, with
> something more elegant and useful than "C-style unions with a way to tell
> which field is active", while maintaining the performance characteristics.
>
> That said, the replacement I'm proposing makes the most sense (to me,
> anyway) when thought of as an extension of C's enum. Extending enum, or
> absorbing enum into this new concept, isn't actually the goal, it's just a
> pleasant side-effect. We'll get back to the idea that this is a "union"
> replacement after introducing it as an "enum" extension.
>
> I'll call it "variant".
>
>
> Consider:
>
> enum Color { Red, Green, Blue }
> (Ignore for now the idea of Red = 1, etc.)
>
> Red, Green, and Blue can be thought of as subtypes of Color, like a very
> simple class hierarchy. With enums, we're limited to unit types, so we
> can't attach any additional information to any of the subtypes, and we're
> obviously limited to a single level of subtypes in our hierarchy. But when
> this is all we need, the enum is the perfect solution, with great
> performance.
>
> Now, if we want to add an "Other" color, with some information attached to
> it, enum no longer suffices, so we could rewrite this as a class hierarchy:
>
> class Color {}
> class Red : Color {}
> class Green : Color {}
> class Blue : Color {}
> class Other : Color {
> var name: string;
> }
>
> Red, Green, and Blue are unit types, while Other is a composite type. All
> are subtypes of Color. This works, but it's a bit verbose, and it's no
> longer efficient (particularly if we have a large vector of Colors, for
> example). (Let's just pretend for now that we have a way to check the
> dynamic type of an object, so that we can use these types similarly to how
> we might use our enum.)
>
> With a "variant" construct, we could write this as:
>
> variant Color {
> Red,
> Green,
> Blue,
> Other { var name: string; }
> }
>
> We should think of this as listing the subtypes of Color. Some or all of
> these subtypes can be composite types, with one or more fields.
> Equivalently, I suppose we could write: Color { Red{}, Green{}, Blue{},
> Other{var name: string;}}. To be clear, Other doesn't have an anonymous
> record attached to it; Other is a composite type itself, and values of type
> Color.Other have a field called "name".
>
> This is almost syntactic sugar for the class hierarchy above, except that
> the "variant" would have value semantics, and would be implemented
> efficiently (e.g. `struct Color { int tag; union { string Other_name; }}`).
>
> The special case of "variant" where all the subtypes are unit types is
> obviously the same as "enum". This special case could be implemented
> exactly the same as today's enums, and an efficient implementation could be
> guaranteed in the language spec. (Above, I said to ignore the idea of the
> user assigning an integer value to each subtype, but we could certainly
> allow `variant Color { Red = 1, Blue = 2, Green = 3 }`. I'm not sure if it
> makes sense to assign 'Other' an integer value, or whether this would be
> restricted to variants with only unit types, but I want to stick to the
> bigger picture for now).
>
>
> Now, back to "union". In my mind, the unifying theme between "enum" and
> "union" is the "one of" concept. For example:
>
> enum EResult { Success, Error }
> union UResult { var Success: int; var Error: string; }
> var v1: EResult = f1();
> var v2: UResult = f2();
>
> v1 is "one of" Success or Failure, but we *can't* attach other information
> to either of the two possibilities.
> v2 is "one of" Success or Failure, and we *have to* attach other
> information to each of the two possibilities.
>
> We can replace these two separate concepts with "variant", which allows us
> to attach information to some possibilities and not others, as we wish:
>
> variant VResult {
> Success,
> Error { var message: string; }
> }
>
> Because of how "variant" is implemented, we no longer need "union".
>
> This:
> record Error { var message: string; var position: int; }
> union NumOrError {
> var num: int;
> var error: Error;
> }
>
> would instead be written:
>
> variant NumOrError {
> Num { var value: int; },
> Error { var message: string; var position: int; }
> }
>
> and could be implemented exactly the same.
>
> I think the latter is much nicer. There's no real indication in the "union"
> syntax (or variations on that syntax) that the fields are mutually
> exclusive, other than the word "union" (or whatever it would be called). It
> looks too much like a record.
>
> In "variant" it's (hopefully) clear that a value can only be one of the
> subtypes at a time (like an enum), and there's a natural syntax for
> matching and unpacking the subtypes in a 'select' statement:
> select x {
> when Num{value} do ...
> when Error{msg, pos} do ...
> }
>
>
> To repeat the ideas on optional returns, and possible failures, that could
> look something like:
> (Using Kyle's generic syntax proposal.)
>
> // Copying Haskell
> variant Maybe(type T) {
> Just { var value: T; },
> Nothing
> }
> var c: Maybe(string) = getUserInputOrTimeout();
>
> variant Expected(type T) {
> Success { var value: T; },
> Error { var message: string; }
> }
> var n: Expected(int) = parseInt(getUserInput())
>
> // This is just a special case of Haskell's Either type:
> variant Either(type A, type B) {
> Left { var value: A; },
> Right { var value: B; }
> }
> var x: Either(int, string) = f();
>
>
> This doesn't address everything in your notes, but it's about time I called
> it a night. Please note that I haven't tied my ego to these ideas, and I'm
> certainly open to discussing other options.
>
> - Sean
>
------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers