Re: [Chapel-developers] Tagged unions

Brad Chamberlain Wed, 28 Jan 2015 17:24:07 -0800

Hi Sean --

We had a deep-dive yesterday on this topic so that Kyle could quickly 
bring the team up-to-speed with the conversation, motivation, and some of 
the decisions being wrestled with for those of us who'd fallen behind on 
the email thread and didn't have a lot of familiarity with Rust's sum 
types.

At the end of the meeting, I took an action item to try and capture some 
notes on our discussion, along with some of the sticking points, as a 
response to this thread. It comes with the caveat that the notes I took 
probably reflect my own opinions more strongly than others', and that 
there are some things that I've noted here which weren't discussed in the 
meeting (due to lack of time or importance).  Also, I've tried to focus on 
capturing things rather than polishing this document, so I apologize if 
it's a bit sprawling.  Hopefully I've at least been consistent in 
terminology and such.

Questions come up throughout the topic, but I've tried to call out 
specific questions at the end of each section to focus on.

Hope this is helpful for giving a sense of what concerns the team
currently has and what it would take to put a successful sum type
proposal together,

-Brad

Traditional (C-style) enums vs. Rust-style "sum type" enums:
------------------------------------------------------------

I think there was a certain mental sticking point within the team, in 
unifying traditional (C-style) enums and the "sum type" style of enum 
that's being proposed here.  I.e., are they really something that should 
be unified, and to what benefit?

For most of the conversation, I think many of us were thinking that we 
should have two language concepts because we didn't see any correlation 
between the two features.  In this perspective, one might use the keyword 
'enum' for a traditional C-style enum and something different ('type 
enum'?  'variant record'?  'overlap record'?  'union' (Kyle's objection 
that it isn't truly a union in the set-theoretic sense noted, though 
arguably we'd just be inheriting C's abuse of the term)?) for the other. 
We never did come up with any satisfying proposal for sum types, but for 
the purposes of this discussion, I'll use 'overlap record' as a 
placeholder for it to keep the two cases teased apart for the purposes of 
clear discussion (note that I'm not actually proposing we adopt this -- 
it's just sufficiently clear/different/record-like that it serves as a 
good placeholder for me in these notes.  Also, I'm not trying to shut down 
the possibility of unifying them, just trying to keep them distinct for 
the sake of conversation).

So, imagine that a traditional Chapel (C-style) enum looks something like 
this:

        enum Color {
          red = 1,
          blue = 2,
          green = 4
        }

And a "sum type" style (most similar to 'union' in Chapel today) looks 
something like this:

        overlap record MyUnionishThing {
          var val: int;      // maybe I store an int
          var str: string;   // or maybe I store a string
        }

[Note that the former uses commas, whereas for the latter, I've used 
semicolons to make it look more like the fields of a class or record. 
This disparity would have to be resolved if these concepts were to be 
unified into a single concept].

Under the covers, one should imagine this second construct as resulting in 
a C-level implementation like this:

        struct MyUnionishThing {
          int tag;  // which field is active?
          union {
            int val;    // use a C union for the fields themselves
            string str;
          }
        }

Kyle argued against this old-school thinking ("aren't these two 
concepts?") by arguing that combining the two concepts into one had merit 
in that one could have a traditional enum that also mixed in type fields. 
So, imagine for example, something like this, used to store three 
well-defined colors compactly, and other colors by name (artifical, 
perhaps):

        overlap record Color {
          red = 1,
          blue = 2,  // note that I'm still using commas here, not
          green = 4, // for any good reason other than habit
          var otherColor: string;
        }

Q: Are there practical, compelling use cases of this kind of mixing, or
    is this strictly academic?

There was some confusion about what the meaning of 'red', 'blue', and 
'green' were in the above -- i.e., were they integer fields, equivalent to 
the following?

        overlap record Color {
          const red: int = 1;
          const blue: int = 2;
          const green: int = 4;
          var otherColor: string;
        }

Kyle argued that, no, they weren't/shouldn't be, because they don't 
logically have/need storage associated with them and can't change value. 
In that sense, they're more like a "for free" interpretation of what the 
"tag" means in the underlying implementation.  I.e., the above would map 
down to:

        struct Color {
          int tag;  // which field is active?
          union {
            string otherColor;
          }
        }

This led us to think that the best way to write this out longhand using 
Chapel concepts would be to describe such fields using Chapel's param 
concept:

        overlap record Color {
          param red = 1,   // could optionally put ': int' in here as
                blue = 2,  // well, but I left that out for brevity
                green = 4;
          var otherColor: string;
        }

Thus, all C-style/traditional enums could arguably be rewritten in the new 
system as 'param' fields of a sum type.

As part of this discussion, we discussed what advantages there might be in 
keeping the C-style case separate/different from the general one and began 
referring to "homogeneous param" sum types -- those in which all fields 
were params of the same (or compatible?) type.  I believe the two main 
benefits were considered to be:

(a) guaranteed cheap representation

(b) ease of learning for users coming from C-like languages rather than
     Rust-like languages.

A related design goal that nobody objected to (outright) was to prevent 
current codes that use 'enums' from breaking if at all possible.

To that end, it's worth asking "how do we use (or envision using) enums 
today?  What do we rely on?"  As best I can tell, the answer is:

* value comparisons in conditions and when clauses (probably obvious)

* ability to convert to/from ints in lightweight manner (coercion/cast)

* ability to cast enums to/from strings

* ability to do symbolic I/O with console/files (unlike C) -- i.e.,
   writeln(myColor); will result in 'green' or 'blue' rather than '4'
   or '2'.  Ditto on input.

* ability to create domains (index sets) over set of possible values
   and arrays over those domains

* support for 'config var's and 'config const's of enum type which
   support specifying the values via strings.

* support for 'param's and 'config param's of enum type (suggests support
   for compile-time reasoning about values, similar to param integers)

* ability to create sync vars of enum type because they're known to
   be singleton, built-in values (sync vars of records and other
   types that have multiple components are in the process of being
   deprecated)

* automatic downcast semantics similar to param values (i.e., passing
   an enum whose values all fit within int(8) to a routine expecting
   int(8) even though enum might be considered default int in size by
   default)

* ability to iterate over all values (undocumented, but used in the
   implementation I believe -- and seems helpful to provide to users)

* ability to define secondary methods on enums

* (proposed, but not implemented) ability to specify an int size that
   should be used to represent the enum (would be equivalent to a
   param of that int size for dispatch purposes; var of that size for
   execution time purposes)

* (proposed, but not implemented) ability to define 'extern enum's
   in order to interoperate with enums from C within Chapel; symmetrically,
   one might want to export Chapel enums to C.

* (proposed, but not implemented) ability to 'use' enum to inject its
   symbols into current scope.  This would allow one to do:

        use Color;

        ...red...

   rather than:

        ...Color.red...

Summarizing the main questions noted in passing from this section:

Q: Should enums and sum types be one language concept or two?

    - If two...,

      Q: Can enums be considered a sugar over sum types, or are
         they different in some deeper way?

      Q: Conversely, do we treat homogenous param sum types
        identically to enums or differently?

    - If one...

      Q: How to resolve backwards compatibility with current concept
         while trying to make sum types more in the record/class camp?

      Q: Do all of the use cases above make sense for sum types as
         well?

      Q: Are we in any danger of making C-style enums heavier weight
        than they need to be?

Q: Whether one or two concepts, what specific syntax should be used
    for sum types?
    - keyword(s)
    - solve commas (list) vs. semicolons (fields)
    - other...

Relationship between anonymous records and sum types
----------------------------------------------------

I think the second-biggest sticking point for much of the team (maybe even 
the first for me) was the conflation of the sum type discussion with, in 
my mind, support for anonymous records.  For example, I could imagine that 
the barriers to getting consensus on sum types would be lowered 
significantly if each of the options in the sum type was a single field of 
a single type (where that type may, itself, store a multiplicity of 
values).  E.g.:

        overlap record MyUnionishThing {
          var val: int;      // maybe I store an int
          var str: string;   // or maybe I store a string
          var myR: R;        // or maybe a record
          var myC: C;        // or maybe a class
        }

Where I think the proposal (and email thread) starts to bog down for many 
of us is in (what I'd describe as) trying to wedge anonymous records (and 
maybe pattern matching) into the discussion.  I understand that this is 
arguably a common/useful feature in sum types, but it's not clear to me 
that we should support it here in Chapel without doing it via general 
support for anonymous record types in other contexts as well.

As an example, imagine that we had the ability to create an anonymous 
record in an expression context as follows:

        'record { var x: int; var y: string; }'

Then, it seems we could use this in contexts like passing a type to a 
function:

        proc foo(type t) { var x: t; ... }

        foo(record { var x: int; var y: string; });

or instantiating a class with a record:

        var myVec = new Vect(eltType = record { ... });

or using it to create an array of anonymous record types:

        var A: [1..100, 1..100] [1..3] record { var x: int;
                                                var y: string;
                                              }

So an open question that wasn't resolved well during our meeting yesterday 
is:

Q: What is it (if anything) about sum types that wants anonymous records
    more than other contexts?

And in my mind:

Q: If we took the "single field/type" approach illustrated at the top of
    this section and then added an anonymous record concept, would that
    give us everything we wanted?  That is:

        overlap record MyUnionishThing {
          var val: int;      // maybe I store an int
          var str: string;   // or maybe I store a string
          var myR: record { var x: int;
                            var y: string;
                          };        // or maybe an anon record
        }

    Or do the two things really have to be co-developed for some reason?

After the meeting, I also wondered whether Chapel's current support for 
tuple-style declarations would give us what we wanted here.  That is:

        overlap record MyUnionishThing {
          var (x,y): (int, string);  // maybe I store an int,string pair
          var t: (real, string);     // or maybe a real,string pair
        }

Thus:

Q: Are tuple types insufficient for cases like these?  If so, why?

My summary of this section is that opening up the anonymous record 
question opens up a Pandora's box of issues about the relationship to 
other language features that seem like it would be nice to avoid or to 
deal with orthogonally if possible.  So if value/opportunity was not lost, 
I'd suggest focusing on the sum type concept first (relying on traditional 
records or perhaps tuples when a multiplicity of values is needed) and 
tackling anonymous records or pattern matching sugars for select 
statements as a separate topic.

Select-related questions
------------------------

This reminds me of a few other questions related to select statements (may 
be mine only):

Q: Is 'select' the right keyword for determining which field of a sum type
    is active given that it doesn't have this meaning in any other context?
    (i.e., such cases can't be re-written as a chained conditional and is
    more like a meta-programming/pattern-matching construct).  Should some
    other keyword be used to distinguish this case instead?  Could this
    keyword be unified with whatever syntax is used to declare a sum type
    itself?  (e.g., 'overlap select' for my 'overlap record' placeholder?)

Q: Should we permit the ability to access a sum type's "field" directly
    in the event that a user has reason to believe they know which field
    is active as a means of avoiding the need to use a 'select' to be
    guaranteed to be safe?  This would seem to represent a productivity
    (ability to sketch code) vs. safety (ability to write incorrect code)
    issue.  I'm imagining that an execution time error would occur if the
    inactive field was accessed (which I think is what our current unions
    do?).  One might argue that this is similar to the fact that we allow
    array accesses that may be out of bounds rather than requiring all
    array accesses to be wrapped within a bounds check (?).

We also wrestled a bit with syntax of select statements, but I don't think 
these questions were a major sticking point for anyone, and seemed like a 
secondary concern overall.  Specifically, I didn't write down any 
significant questions or notes here apart from a desire to not have 
to say:

        when MyUnionishThing.x ...
        when MyUnionishThing.y ...

in favor of simply:

        when x
        when y

to that end, I jotted down a question to the effect of:

Q: Can a select on a sum type be treated as a 'use' of that type?" (trying
    to unify it with the proposal to 'use' current enum types as a means of
    injecting the symbols into the current scope).

We also discussed Kyle's proposal to drop 'when's from selects (also 
arguably orthogonal to this discussion) and there were no significant 
objections, though some of us prefer them from a readability perspective, 
so it was proposed that if there was a move to drop them, perhaps they 
could be made optional for those who prefer readability (by some 
definition) over brevity.

Generic sum types
-----------------

There was a little bit of a discussion about how to express generic sum 
types, similar to what you and Kyle were proposing on email.  My takeaway 
was that one should be able to write cases that were very explicit about 
their types and those that relied on more of the inference-type of 
declaration, similar to records today:

        overlap record R {
          var x: int;  // maybe I'll store an int?
          var y;       // maybe I'll store something else of unknown
                       // type -- look at constructor call/type
                       // signatures to determine?  Does that make
                       // sense?
        }

Where the more explicit case might look something like:

        overlap record R {
          type T;
          var x: T;    // maybe I'll store a T?
          var y: 2*T;  // or maybe I'll store 2 T's?
        }

or perhaps:

        overlap record R {
          type T1;
          type T2;
          var x: T1;    // maybe I'll store a T1?
          var y: T2;    // maybe I'll store a T2?
        }

We also discussed an alternative syntax like the one you were proposing:

        overlap record R(type T1, type T2) {
          var x: T1;    // maybe I'll store a T1?
          var y: T2;    // maybe I'll store a T2?
        }

though I think this is arguably an orthogonal conversation from sum types 
and should be taken up separately (specifically, we should either permit 
this for record and class types as well, or for none of them).

Q: Given that the types of fields in sum types are not all present at
    once, does the inferred type form (the first above) make sense in
    the Chapel world?  What are the rules / how would the type of y
    be specified?

Q: Are there any other details to work through here?

Q: Does the fact that C-style enums don't really need this type of generic
    support (that I can detect) suggest taking a non-unified approach?

Primary methods on sum types/enums
----------------------------------
I don't know that we discussed this, but it came to mind afterwards, and I 
see that it came up in your discussion as well.  When I think of a sum 
type (again, thinking of it more like a record or class), I imagine myself 
being able to write methods within its definition as follows:

        overlap record MyUnionishThing {
          var val: int;      // maybe I store an int
          var str: string;   // or maybe I store a string

          proc foo() {
            select this ...  // do something based on whether val or str
          }                  // is active
        }

whereas for enums, it seems odd to me to support something like:

        enum Color {
          red = 1,
          blue = 2,
          green = 4

          proc isPrimary() {
            return (this < 3);
          }
        }

and instead, I'd expect to see this written as a secondary method:

        enum Color {
          red = 1,
          blue = 2,
          green = 4
        }

        proc enum.isPrimary() {
          return (this < 3);
        }

(of course, I should be able to write the previous case as a secondary 
method as well and I don't think anyone would think otherwise):

        proc MyUnionishThing.foo() {
          select this ...  // do something based on whether val or str
        }                  // is active

Questions from Sean's mail:

Q: Should sum types have value or reference semantics?

My A: "value" seems more intuitive to me given their C-style enum roots,
       and seeming similarity to records.

Relationship between sum types and error handling
-------------------------------------------------

I'll mention that the main motivations that the team identified for 
pursuing this were:

* fix/improve our current union story (which, arguably is broken only in
   the lack of a field selection mechanism and an arguable abuse of the
   term 'union').

* a possible way of dealing with error cases in things like library
   routines

   - this led to a healthy lunchtime debate over the tradeoffs between
     sum-type Maybe/Error return types vs. optional error arguments as
     our I/O routines currently support (which could be cleaned up with
     a proposed default argument query capability) vs. a more traditional
     excption model.  But that's a completely different email thread, and
     something that I think we may task our new hire to study.  There
     are tradeoffs between safety and bulletproofness and tractability
     here that we need to spend more time wrestling through in my opinion.

On Wed, 21 Jan 2015, Sean Billig wrote:

> Hi Kyle,
>
>
>>  With the Maybe example, one bit of syntax doesn't quite make sense:
>>
>
>>  `Maybe(int)` - what is `int` describing here? What happens if the enum
>> has a generic Left and Right? I think doing something along the lines of:
>>
>>      enum Or(type U, type V) {
>>          Left { var value: T; },
>>         Right { var value: V; }
>>     }
>>
>>  could work here.
>>
>
> I was following the current syntax for generic records, e.g.
> record P { var x; }
> var r : P(real);
>
> That said, I definitely prefer your suggestion of naming the generic types
> explicitly. I think this should supplant both existing options for generic
> types (i.e. both `record P` above and `record Q { type T; var x: T}` would
> become `record R(type T) { var x: T; }`).
>
> Some open questions I see in this related to this topic:
>>
>>  1. Reference vs Value semantics for enums? Some situations want one or
>> the other.
>>
>
> I'd say value semantics. In cases where the user wants reference semantics,
> the value could be wrapped in something like rust's Box or Rc/Arc types
> (c++'s unique_ptr or shared_ptr types). I'm of the opinion that making
> value semantics the default (where reasonable), and making reference
> semantics a tiny bit more work leads to better code.
>
> We might want to discuss move semantics at some point, but I'm happy
> postponing that.
>
> 2. Do we allow anonymous product types?
>>      enum Or(type U, type V) {
>>          Left( U ),
>>         Right( V, V )
>>     }
>>
>
> I have mixed feelings about this. If we require that the field names used
> in select statements match those used in the enum definition, then enum
> types defined in libraries might be awkward to use. For example, if Either
> were defined in the standard library like so:
>    enum Either(type U, type V) {
>      Left{ var value: U; },
>      Right{ var value: V; }
>    }
>
> Then the user is forced to use the name "value" in their select statement:
>    auto a: Either(int, real) = foo();
>    select a {
>      when Left{value} do ...
>      when Right{value} do ...
>    }
> which might mask a variable in the enclosing scope, or might be less
> desirable than using more informative names.
>
> Maybe this is an argument against requiring that the field names used in
> `select` match those used in the enum definition...
>
> If we're going to allow anonymous products in enums, should we allow them
> to stand alone as well (i.e. tuples with a type name)? Something like
> `record X(int, string);`? How would generics look? Maybe we should adopt
> angled brackets for generic types:
>  record Pair<A, B>(A, B); // a la rust
>  record R<U, V> { var x: U; var y: V; }
>  var r: R<int, real>;
>  enum Maybe<T> { Just(X), Nothing }
>  enum Maybe<T> { Just { var value: T; }, Nothing }
>
>> Aside: I also like this syntax for record initialization it feels more
>> in-tune with the value semantics.
>>
>> For now lets stick to the normal parens, and address this as a separate
>> issue.
>>
>
> If we use positional pattern matching in `select`, and use parens for
> initialization (i.e. a constructor), we don't really need anonymous
> products. For example:
>
>    enum Either(type U, type V) {
>      Left{ var value: U; },
>      Right{ var value: V: }
>    }
>
>    var a: Either(int, string) = Right("hi");
>    select a {
>      when Left{count} do ... // meaningful name, not "value"
>      when Right{msg} do ...
>    }
>
>    enum Entity {
>      Person { var name: string; var ssn: string; },
>      Dog { var name: string; }
>    }
>    var bill = Person(name="Bill", ssn="123-45-6789");
>    // This is probably just as good as Person{name:"Bill",
> ssn:"123-45-6789"};
>
> If we do this, then anonymous products probably aren't compelling enough to
> add to the language, either within enums or outside of enums.
>
> I don't have strong feelings about any of this yet; just thinking out loud
> at this point.
>
> - Sean
>

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers

Re: [Chapel-developers] Tagged unions

Reply via email to