Hi Sean --
We had a deep-dive yesterday on this topic so that Kyle could quickly
bring the team up-to-speed with the conversation, motivation, and some of
the decisions being wrestled with for those of us who'd fallen behind on
the email thread and didn't have a lot of familiarity with Rust's sum
types.
At the end of the meeting, I took an action item to try and capture some
notes on our discussion, along with some of the sticking points, as a
response to this thread. It comes with the caveat that the notes I took
probably reflect my own opinions more strongly than others', and that
there are some things that I've noted here which weren't discussed in the
meeting (due to lack of time or importance). Also, I've tried to focus on
capturing things rather than polishing this document, so I apologize if
it's a bit sprawling. Hopefully I've at least been consistent in
terminology and such.
Questions come up throughout the topic, but I've tried to call out
specific questions at the end of each section to focus on.
Hope this is helpful for giving a sense of what concerns the team
currently has and what it would take to put a successful sum type
proposal together,
-Brad
Traditional (C-style) enums vs. Rust-style "sum type" enums:
------------------------------------------------------------
I think there was a certain mental sticking point within the team, in
unifying traditional (C-style) enums and the "sum type" style of enum
that's being proposed here. I.e., are they really something that should
be unified, and to what benefit?
For most of the conversation, I think many of us were thinking that we
should have two language concepts because we didn't see any correlation
between the two features. In this perspective, one might use the keyword
'enum' for a traditional C-style enum and something different ('type
enum'? 'variant record'? 'overlap record'? 'union' (Kyle's objection
that it isn't truly a union in the set-theoretic sense noted, though
arguably we'd just be inheriting C's abuse of the term)?) for the other.
We never did come up with any satisfying proposal for sum types, but for
the purposes of this discussion, I'll use 'overlap record' as a
placeholder for it to keep the two cases teased apart for the purposes of
clear discussion (note that I'm not actually proposing we adopt this --
it's just sufficiently clear/different/record-like that it serves as a
good placeholder for me in these notes. Also, I'm not trying to shut down
the possibility of unifying them, just trying to keep them distinct for
the sake of conversation).
So, imagine that a traditional Chapel (C-style) enum looks something like
this:
enum Color {
red = 1,
blue = 2,
green = 4
}
And a "sum type" style (most similar to 'union' in Chapel today) looks
something like this:
overlap record MyUnionishThing {
var val: int; // maybe I store an int
var str: string; // or maybe I store a string
}
[Note that the former uses commas, whereas for the latter, I've used
semicolons to make it look more like the fields of a class or record.
This disparity would have to be resolved if these concepts were to be
unified into a single concept].
Under the covers, one should imagine this second construct as resulting in
a C-level implementation like this:
struct MyUnionishThing {
int tag; // which field is active?
union {
int val; // use a C union for the fields themselves
string str;
}
}
Kyle argued against this old-school thinking ("aren't these two
concepts?") by arguing that combining the two concepts into one had merit
in that one could have a traditional enum that also mixed in type fields.
So, imagine for example, something like this, used to store three
well-defined colors compactly, and other colors by name (artifical,
perhaps):
overlap record Color {
red = 1,
blue = 2, // note that I'm still using commas here, not
green = 4, // for any good reason other than habit
var otherColor: string;
}
Q: Are there practical, compelling use cases of this kind of mixing, or
is this strictly academic?
There was some confusion about what the meaning of 'red', 'blue', and
'green' were in the above -- i.e., were they integer fields, equivalent to
the following?
overlap record Color {
const red: int = 1;
const blue: int = 2;
const green: int = 4;
var otherColor: string;
}
Kyle argued that, no, they weren't/shouldn't be, because they don't
logically have/need storage associated with them and can't change value.
In that sense, they're more like a "for free" interpretation of what the
"tag" means in the underlying implementation. I.e., the above would map
down to:
struct Color {
int tag; // which field is active?
union {
string otherColor;
}
}
This led us to think that the best way to write this out longhand using
Chapel concepts would be to describe such fields using Chapel's param
concept:
overlap record Color {
param red = 1, // could optionally put ': int' in here as
blue = 2, // well, but I left that out for brevity
green = 4;
var otherColor: string;
}
Thus, all C-style/traditional enums could arguably be rewritten in the new
system as 'param' fields of a sum type.
As part of this discussion, we discussed what advantages there might be in
keeping the C-style case separate/different from the general one and began
referring to "homogeneous param" sum types -- those in which all fields
were params of the same (or compatible?) type. I believe the two main
benefits were considered to be:
(a) guaranteed cheap representation
(b) ease of learning for users coming from C-like languages rather than
Rust-like languages.
A related design goal that nobody objected to (outright) was to prevent
current codes that use 'enums' from breaking if at all possible.
To that end, it's worth asking "how do we use (or envision using) enums
today? What do we rely on?" As best I can tell, the answer is:
* value comparisons in conditions and when clauses (probably obvious)
* ability to convert to/from ints in lightweight manner (coercion/cast)
* ability to cast enums to/from strings
* ability to do symbolic I/O with console/files (unlike C) -- i.e.,
writeln(myColor); will result in 'green' or 'blue' rather than '4'
or '2'. Ditto on input.
* ability to create domains (index sets) over set of possible values
and arrays over those domains
* support for 'config var's and 'config const's of enum type which
support specifying the values via strings.
* support for 'param's and 'config param's of enum type (suggests support
for compile-time reasoning about values, similar to param integers)
* ability to create sync vars of enum type because they're known to
be singleton, built-in values (sync vars of records and other
types that have multiple components are in the process of being
deprecated)
* automatic downcast semantics similar to param values (i.e., passing
an enum whose values all fit within int(8) to a routine expecting
int(8) even though enum might be considered default int in size by
default)
* ability to iterate over all values (undocumented, but used in the
implementation I believe -- and seems helpful to provide to users)
* ability to define secondary methods on enums
* (proposed, but not implemented) ability to specify an int size that
should be used to represent the enum (would be equivalent to a
param of that int size for dispatch purposes; var of that size for
execution time purposes)
* (proposed, but not implemented) ability to define 'extern enum's
in order to interoperate with enums from C within Chapel; symmetrically,
one might want to export Chapel enums to C.
* (proposed, but not implemented) ability to 'use' enum to inject its
symbols into current scope. This would allow one to do:
use Color;
...red...
rather than:
...Color.red...
Summarizing the main questions noted in passing from this section:
Q: Should enums and sum types be one language concept or two?
- If two...,
Q: Can enums be considered a sugar over sum types, or are
they different in some deeper way?
Q: Conversely, do we treat homogenous param sum types
identically to enums or differently?
- If one...
Q: How to resolve backwards compatibility with current concept
while trying to make sum types more in the record/class camp?
Q: Do all of the use cases above make sense for sum types as
well?
Q: Are we in any danger of making C-style enums heavier weight
than they need to be?
Q: Whether one or two concepts, what specific syntax should be used
for sum types?
- keyword(s)
- solve commas (list) vs. semicolons (fields)
- other...
Relationship between anonymous records and sum types
----------------------------------------------------
I think the second-biggest sticking point for much of the team (maybe even
the first for me) was the conflation of the sum type discussion with, in
my mind, support for anonymous records. For example, I could imagine that
the barriers to getting consensus on sum types would be lowered
significantly if each of the options in the sum type was a single field of
a single type (where that type may, itself, store a multiplicity of
values). E.g.:
overlap record MyUnionishThing {
var val: int; // maybe I store an int
var str: string; // or maybe I store a string
var myR: R; // or maybe a record
var myC: C; // or maybe a class
}
Where I think the proposal (and email thread) starts to bog down for many
of us is in (what I'd describe as) trying to wedge anonymous records (and
maybe pattern matching) into the discussion. I understand that this is
arguably a common/useful feature in sum types, but it's not clear to me
that we should support it here in Chapel without doing it via general
support for anonymous record types in other contexts as well.
As an example, imagine that we had the ability to create an anonymous
record in an expression context as follows:
'record { var x: int; var y: string; }'
Then, it seems we could use this in contexts like passing a type to a
function:
proc foo(type t) { var x: t; ... }
foo(record { var x: int; var y: string; });
or instantiating a class with a record:
var myVec = new Vect(eltType = record { ... });
or using it to create an array of anonymous record types:
var A: [1..100, 1..100] [1..3] record { var x: int;
var y: string;
}
So an open question that wasn't resolved well during our meeting yesterday
is:
Q: What is it (if anything) about sum types that wants anonymous records
more than other contexts?
And in my mind:
Q: If we took the "single field/type" approach illustrated at the top of
this section and then added an anonymous record concept, would that
give us everything we wanted? That is:
overlap record MyUnionishThing {
var val: int; // maybe I store an int
var str: string; // or maybe I store a string
var myR: record { var x: int;
var y: string;
}; // or maybe an anon record
}
Or do the two things really have to be co-developed for some reason?
After the meeting, I also wondered whether Chapel's current support for
tuple-style declarations would give us what we wanted here. That is:
overlap record MyUnionishThing {
var (x,y): (int, string); // maybe I store an int,string pair
var t: (real, string); // or maybe a real,string pair
}
Thus:
Q: Are tuple types insufficient for cases like these? If so, why?
My summary of this section is that opening up the anonymous record
question opens up a Pandora's box of issues about the relationship to
other language features that seem like it would be nice to avoid or to
deal with orthogonally if possible. So if value/opportunity was not lost,
I'd suggest focusing on the sum type concept first (relying on traditional
records or perhaps tuples when a multiplicity of values is needed) and
tackling anonymous records or pattern matching sugars for select
statements as a separate topic.
Select-related questions
------------------------
This reminds me of a few other questions related to select statements (may
be mine only):
Q: Is 'select' the right keyword for determining which field of a sum type
is active given that it doesn't have this meaning in any other context?
(i.e., such cases can't be re-written as a chained conditional and is
more like a meta-programming/pattern-matching construct). Should some
other keyword be used to distinguish this case instead? Could this
keyword be unified with whatever syntax is used to declare a sum type
itself? (e.g., 'overlap select' for my 'overlap record' placeholder?)
Q: Should we permit the ability to access a sum type's "field" directly
in the event that a user has reason to believe they know which field
is active as a means of avoiding the need to use a 'select' to be
guaranteed to be safe? This would seem to represent a productivity
(ability to sketch code) vs. safety (ability to write incorrect code)
issue. I'm imagining that an execution time error would occur if the
inactive field was accessed (which I think is what our current unions
do?). One might argue that this is similar to the fact that we allow
array accesses that may be out of bounds rather than requiring all
array accesses to be wrapped within a bounds check (?).
We also wrestled a bit with syntax of select statements, but I don't think
these questions were a major sticking point for anyone, and seemed like a
secondary concern overall. Specifically, I didn't write down any
significant questions or notes here apart from a desire to not have
to say:
when MyUnionishThing.x ...
when MyUnionishThing.y ...
in favor of simply:
when x
when y
to that end, I jotted down a question to the effect of:
Q: Can a select on a sum type be treated as a 'use' of that type?" (trying
to unify it with the proposal to 'use' current enum types as a means of
injecting the symbols into the current scope).
We also discussed Kyle's proposal to drop 'when's from selects (also
arguably orthogonal to this discussion) and there were no significant
objections, though some of us prefer them from a readability perspective,
so it was proposed that if there was a move to drop them, perhaps they
could be made optional for those who prefer readability (by some
definition) over brevity.
Generic sum types
-----------------
There was a little bit of a discussion about how to express generic sum
types, similar to what you and Kyle were proposing on email. My takeaway
was that one should be able to write cases that were very explicit about
their types and those that relied on more of the inference-type of
declaration, similar to records today:
overlap record R {
var x: int; // maybe I'll store an int?
var y; // maybe I'll store something else of unknown
// type -- look at constructor call/type
// signatures to determine? Does that make
// sense?
}
Where the more explicit case might look something like:
overlap record R {
type T;
var x: T; // maybe I'll store a T?
var y: 2*T; // or maybe I'll store 2 T's?
}
or perhaps:
overlap record R {
type T1;
type T2;
var x: T1; // maybe I'll store a T1?
var y: T2; // maybe I'll store a T2?
}
We also discussed an alternative syntax like the one you were proposing:
overlap record R(type T1, type T2) {
var x: T1; // maybe I'll store a T1?
var y: T2; // maybe I'll store a T2?
}
though I think this is arguably an orthogonal conversation from sum types
and should be taken up separately (specifically, we should either permit
this for record and class types as well, or for none of them).
Q: Given that the types of fields in sum types are not all present at
once, does the inferred type form (the first above) make sense in
the Chapel world? What are the rules / how would the type of y
be specified?
Q: Are there any other details to work through here?
Q: Does the fact that C-style enums don't really need this type of generic
support (that I can detect) suggest taking a non-unified approach?
Primary methods on sum types/enums
----------------------------------
I don't know that we discussed this, but it came to mind afterwards, and I
see that it came up in your discussion as well. When I think of a sum
type (again, thinking of it more like a record or class), I imagine myself
being able to write methods within its definition as follows:
overlap record MyUnionishThing {
var val: int; // maybe I store an int
var str: string; // or maybe I store a string
proc foo() {
select this ... // do something based on whether val or str
} // is active
}
whereas for enums, it seems odd to me to support something like:
enum Color {
red = 1,
blue = 2,
green = 4
proc isPrimary() {
return (this < 3);
}
}
and instead, I'd expect to see this written as a secondary method:
enum Color {
red = 1,
blue = 2,
green = 4
}
proc enum.isPrimary() {
return (this < 3);
}
(of course, I should be able to write the previous case as a secondary
method as well and I don't think anyone would think otherwise):
proc MyUnionishThing.foo() {
select this ... // do something based on whether val or str
} // is active
Questions from Sean's mail:
Q: Should sum types have value or reference semantics?
My A: "value" seems more intuitive to me given their C-style enum roots,
and seeming similarity to records.
Relationship between sum types and error handling
-------------------------------------------------
I'll mention that the main motivations that the team identified for
pursuing this were:
* fix/improve our current union story (which, arguably is broken only in
the lack of a field selection mechanism and an arguable abuse of the
term 'union').
* a possible way of dealing with error cases in things like library
routines
- this led to a healthy lunchtime debate over the tradeoffs between
sum-type Maybe/Error return types vs. optional error arguments as
our I/O routines currently support (which could be cleaned up with
a proposed default argument query capability) vs. a more traditional
excption model. But that's a completely different email thread, and
something that I think we may task our new hire to study. There
are tradeoffs between safety and bulletproofness and tractability
here that we need to spend more time wrestling through in my opinion.
On Wed, 21 Jan 2015, Sean Billig wrote:
> Hi Kyle,
>
>
>> With the Maybe example, one bit of syntax doesn't quite make sense:
>>
>
>> `Maybe(int)` - what is `int` describing here? What happens if the enum
>> has a generic Left and Right? I think doing something along the lines of:
>>
>> enum Or(type U, type V) {
>> Left { var value: T; },
>> Right { var value: V; }
>> }
>>
>> could work here.
>>
>
> I was following the current syntax for generic records, e.g.
> record P { var x; }
> var r : P(real);
>
> That said, I definitely prefer your suggestion of naming the generic types
> explicitly. I think this should supplant both existing options for generic
> types (i.e. both `record P` above and `record Q { type T; var x: T}` would
> become `record R(type T) { var x: T; }`).
>
> Some open questions I see in this related to this topic:
>>
>> 1. Reference vs Value semantics for enums? Some situations want one or
>> the other.
>>
>
> I'd say value semantics. In cases where the user wants reference semantics,
> the value could be wrapped in something like rust's Box or Rc/Arc types
> (c++'s unique_ptr or shared_ptr types). I'm of the opinion that making
> value semantics the default (where reasonable), and making reference
> semantics a tiny bit more work leads to better code.
>
> We might want to discuss move semantics at some point, but I'm happy
> postponing that.
>
> 2. Do we allow anonymous product types?
>> enum Or(type U, type V) {
>> Left( U ),
>> Right( V, V )
>> }
>>
>
> I have mixed feelings about this. If we require that the field names used
> in select statements match those used in the enum definition, then enum
> types defined in libraries might be awkward to use. For example, if Either
> were defined in the standard library like so:
> enum Either(type U, type V) {
> Left{ var value: U; },
> Right{ var value: V; }
> }
>
> Then the user is forced to use the name "value" in their select statement:
> auto a: Either(int, real) = foo();
> select a {
> when Left{value} do ...
> when Right{value} do ...
> }
> which might mask a variable in the enclosing scope, or might be less
> desirable than using more informative names.
>
> Maybe this is an argument against requiring that the field names used in
> `select` match those used in the enum definition...
>
> If we're going to allow anonymous products in enums, should we allow them
> to stand alone as well (i.e. tuples with a type name)? Something like
> `record X(int, string);`? How would generics look? Maybe we should adopt
> angled brackets for generic types:
> record Pair<A, B>(A, B); // a la rust
> record R<U, V> { var x: U; var y: V; }
> var r: R<int, real>;
> enum Maybe<T> { Just(X), Nothing }
> enum Maybe<T> { Just { var value: T; }, Nothing }
>
>> Aside: I also like this syntax for record initialization it feels more
>> in-tune with the value semantics.
>>
>> For now lets stick to the normal parens, and address this as a separate
>> issue.
>>
>
> If we use positional pattern matching in `select`, and use parens for
> initialization (i.e. a constructor), we don't really need anonymous
> products. For example:
>
> enum Either(type U, type V) {
> Left{ var value: U; },
> Right{ var value: V: }
> }
>
> var a: Either(int, string) = Right("hi");
> select a {
> when Left{count} do ... // meaningful name, not "value"
> when Right{msg} do ...
> }
>
> enum Entity {
> Person { var name: string; var ssn: string; },
> Dog { var name: string; }
> }
> var bill = Person(name="Bill", ssn="123-45-6789");
> // This is probably just as good as Person{name:"Bill",
> ssn:"123-45-6789"};
>
> If we do this, then anonymous products probably aren't compelling enough to
> add to the language, either within enums or outside of enums.
>
> I don't have strong feelings about any of this yet; just thinking out loud
> at this point.
>
> - Sean
>
------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers