Re: Feedback on Sealed Types

Brian Goetz Mon, 29 Apr 2019 14:02:14 -0700

It would be nice if we could "just" overload enum itself to support arecord-like option:


    enum Node {
        AddNode(Node a, Node b),
        MulNode(Node a, Node b),
        ...;


        ...
    }

but unfortunately that syntax looks confusingly close to something else:( It also doesn't really scale to multi-level hierarchies, though thatmight be OK.

Its worth thinking about, though. The `data` construct from Haskell issurely more direct than modeling a sum with sealed interfaces (though,the latter is also more flexible than `data`.)


On the other hand, declaring a sum of records as:

    sealed interface I {
        record A(int a) implements I { }
        record B(long b) implements I { }
        record C(String c) implements I { }
    }

isn't so bad. The main redundancy here is primarily "implements I", andsecondarily "record". We can surely compress that away like this:


    enum interface I {
        A(int a),
        B(long b),
        C(String c);

        // I methods
    }

but I am not sure it carries its weight, given as it adds littleadditional concision (and no additional semantics.)




On 4/29/2019 4:33 PM, Alan Malloy wrote:

Thanks, Brian. I indeed didn't think of some of your proposed benefitsof sealing non-sum types, as I was focused mostly on things youmentioned explicitly in the JEP, which is somewhat light on theexpected benefits.

I think the first two items in your "challenges" solve each other: Idon't intend sum types to be the only kind of sealed type, but just agood way to declare the simplest kind. I left out the "record" keywordfrom the declaration with the idea that it would be implicit: if youwant the convenient sum-of-products declaration style, you have to userecords. If you want something more complicated, you declare a sealedinterface (or superclass), and N permitted subclasses, declaredseparately in whatever way you want. This restriction helps by makingthe semantics clearer, and I had also hoped that it would lead to asyntax error if you leave out the comma. Looking more closely, I seethis is somewhat precarious: a record declaration looks enough like amethod signature that they may be ambiguous in an interface, if youdon't require the "record" keyword, or if you use a semicolon insteadof a comma. I think it can still work if we require each nested recordto use {...} instead of ; even if it's empty. This way, your twoexamples look like


    interface X {
        class X1 { … }
        class X2 { … }
    }

and

    enumerated interface Y {
        Y1 { … }, <— don’t forget this comma!
        Y2 { … }
    }

The latter would become illegal if you dropped the comma, even if youalso forgot the "enumerated" keyword, because the braces make no sensein an ordinary interface.

On Mon, Apr 29, 2019 at 12:35 PM Brian Goetz <brian.go...@oracle.com<mailto:brian.go...@oracle.com>> wrote:


    Thanks Alan, for this nice exploration.  There’s a lot to respond
    to.  I’ll start with some general comments about sealing, and then
    move on to your alternate proposal for exposing it.

    I can think of several main reasons why you would want to seal a
    hierarchy.

     - To say something about the _interface itself_. That is, that it
    was not designed as a general purpose contract between arms-length
    entities, but that it exists only as a common super type for a
    fixed set of classes that are part of the library.  In other
    words, “please don’t implement me.”

     - To say something about the semantics of the type. Several of
    the examples in your report fall into this category: “a DbResult
    is either a NoRowsFound or a Rows(List<Row>)”.  This tells users
    exactly what the reasonable range of results to expect are when
    doing a query.  Of course, the spec could say the same thing, but
    that involves reading and interpreting the spec. Easier if this
    conclusion can be driven by types (and IDEs can help more here too.)

     - To strengthen typing by simulating unions.  If my method is
    going to return either a String or a Number, the common super type
    is Object.  (Actually, it’s some variant of Serializable &
    Comparable<? extends….>.). Sums-of-products allow library authors
    to make a stronger statement about types in the presence of
    unions.  Exposing a sum of StringHolder(String) and
    NumberHolder(Number), using records and sealed types, is not so
    ceremonious, so some library developers might choose to do this
    instead of Object.

     - Security.  Some libraries may want to know that the code they
    are calling is part of their library, rather than an arbitrary
    implementation of some interface.

     - To aid in exhaustiveness.  We’ve already discussed this at
    length; your point is that this one doesn’t come up as often as
    one might hope.

    Not only is there an obvious synergy between sums and products (as
    many languages have demonstrated), but there is a third factor,
    which is “if you make it easy enough, people will use it more.”
     Clearly records are pretty easy to use; your point is that if
    there were a more streamlined sum-of-products idiom, the third
    factor would be even stronger here.  I think algebraic data types
    is one of those things that will take some time for developers to
    learn to appreciate; the easier we make it, of course the faster
    that will happen.


    Now, to your syntax suggestion.  Overall, I like the idea, but I
    have some concerns.  First, the good parts:

     - The connection with enums is powerful.  Users already
    understand enums, so this will help them understand sums. Enums
    have special treatment in switch; we want the same treatment for
    sealed type patterns. Enums have special treatment for
    exhaustiveness; we want the same for sealed type patterns.  So
    tying these together with some more general enum-ness leans on
    what people already know.

     - While sums and products are theoretically independent features,
    sums-of-products are expected to be quite common.  So it might be
    reasonable to encourage this syntactically.

     - The current proposal has some redundancy, in that the subtypes
    have to say “implements Node”, even if they are nested within
    Node.  With a stronger mechanism for declaring them, as you
    propose, then that can safely be left implicit.

     - I confess that I too like the simplicity of Haskell’s `data`
    declaration, and this brings us closer.

    Now, the challenges:

     - The result is still a little busy.  We need a modifier for
    “enumerated type”, and we would also need to be able to have child
    types be not only records, but ordinary classes and interfaces. 
    So we’d have to have a place for “record”, “class”, or “interface”
    with the declaration of the enumerated classes (as well as other
    modifiers.). That busies up the result a bit.

     - Once we do this, I worry that it will be hard to tell the
    difference between:

        interface X {
            class X1 { … }
            class X2 { … }
        }

    and

        enumerated interface Y {
            class Y1 { … }, <— don’t forget this comma!
            class Y2 { … }
        }

    and that users will forever be making mistakes like forgetting the
    comma, or putting it where it doesn’t belong.

     - This mechanism addresses the very common case of
    sum-of-product, but leaves more esoteric sums out of the picture.
     (Consider the types in java.lang.constant, which really want to
    be sealed.). There, because they are not co-declared, we’d need
    something more like

         sealed interface ConstantDesc
             permits ClassDesc, MethodTypeDesc, …. { }

    It's possible that such a mechanism can be grafted on to your
    proposal, or there is a shuffling that supports it.

    On Apr 29, 2019, at 2:28 PM, Alan Malloy <amal...@google.com
    <mailto:amal...@google.com>> wrote:

    Hello again, amber-spec-experts. I have another report from the
    Google codebase, this time focusing on sealed types. It is
    viewable in full Technicolor HTML at
    http://cr.openjdk.java.net/~cushon/amalloy/sealed-types-report.html (thanks
    again to Liam for hosting), and included below as plain text:

    Author: Alan Malloy (amal...@google.com <mailto:amal...@google.com>)
    Published: 2019-04-29

    Feedback on Sealed Types

    Hello again, amber-spec-experts. I’m back with a second Google
    codebase research project. I’m looking again at the Records &
    Sealed Types proposal (which has now become JDK-8222777), but
    this time I’m focusing on sealed types instead of records, as
    promised in my RFC of a few weeks ago. My goal was to investigate
    Google’s codebase to guess what developers might have done
    differently if they had access to sealed types. This could help
    us understand what works in the current proposal and what to
    consider changing.

    Unlike my previous report, this one contains more anecdotes than
    statistics. It wound up being difficult to build static analysis
    to categorize the interesting cases, so I mostly examined
    promising candidates by hand.

    Summary and Recommendations

    For those who don’t care to read through all my anecdotes, I
    first provide a summary of my findings, and one suggested addition.

    Sealed types, as proposed so far, are a good idea in theory: Java
    already has product types and open polymorphism, and sealed types
    give us closed polymorphism. However, I could not find many cases
    of code being written today that would be greatly enhanced if
    sealed types were available. The main selling point of sealed
    types for application authors is getting help from the compiler
    with exhaustiveness checking, but in practice developers almost
    always have a default case, because they are only interested in a
    subset of the possible subclasses, and want to ignore cases they
    don’t understand. This means that exhaustiveness-checking for
    pattern matches would mostly go unused if developers rewrote
    their existing code using sealed types.

    Pattern matching is great, and can replace visitors in many
    cases, but this does not depend on sealed types except for
    exhaustiveness checks (which, again, would go mostly unused in
    code written today). The class hierarchies for which people
    define visitors today are just too large to write an exhaustive
    pattern match, and so a default case would be very common.

    The other audience for sealed types is library authors. While in
    practice most developers have no great need to forbid subclasses,
    perhaps it would be a boon for authors of particularly popular
    libraries, who need to expose a non-final class as an
    implementation detail but don’t intend for consumers to create
    their own subclasses. Those authors can already include
    documentation saying “you really should not extend this”, but
    there is always some weirdo out there who will ignore your
    warnings and then write an angry letter when the next version of
    your library breaks his program (see: sun.misc.Unsafe). Authors
    of such libraries would welcome the opportunity to make it truly
    impossible to create undesirable subclasses.

    Sealed Types As a Vehicle For Sum Types

    So, sealed types as-is would be an improvement, but a niche one,
    used by few. I think we can get substantially more mileage out of
    them if we also include a more cohesive way to explicitly define
    a sum type and all its subtypes in one place with minimal
    ceremony. Such a sum type could be sealed, implicitly or
    explicitly. A tool like this takes what I see as the
    “theoretical” advantage of sum types (closed polymorphism), and
    makes it “practical” by putting it front and center. Making sums
    an actual language element instead of something “implied” by
    sealing a type and putting its subclasses nearby could help in a
    lot of ways:

    * Developers might more often realize that a sealed/sum type is a
    good model for their domain. Currently it’s a “pattern” external
    to the language instead of a “feature”, and many don’t realize it
    could be applied to their domain. Putting it in the language
    raises its profile, addressing the problem that people don’t
    realize they want it.
    * The compiler could provide help for defining simple
    sums-of-products, while making it possible to opt into more
    complicated subclasses, in much the way that enums do: the
    typical enum just has bare constants like EAST, but you can add
    constructor arguments or override methods when necessary.
    * The ability to more easily model data in this way may result in
    developers writing more classes that are amenable to
    sealing/sums, as they do in other languages with explicit sum
    types (Haskell, Kotlin, Scala). Then, the exhaustiveness-checking
    feature that sealed types provide would pull more weight.

    Since enum types are “degenerate sum types”, the syntax for
    defining sums can borrow heavily from enums. A sketch of the
    syntax I imagine for such things (of course, I am not married to it):
    public type-enum interface BinaryTree<T> {
      Leaf {
        @Override public Stream<T> elements() {return Stream.empty();}
      },
      Node(T data, BinaryTree<T> left, BinaryTree<T> right) {
        @Override public Stream<T> elements() {
          return Stream.concat(left.elements(),
    Stream.concat(Stream.of(data), right.elements()));
        }
      };


      public Stream<T> elements();
    }

    Like enums, you can use a bare identifier for simple types that
    exist only to be pattern-matched against, but you can add fields
    and/or override blocks as necessary. The advantage over declaring
    a sealed type separately from its elements is both concision (the
    compiler infers visible records, superclass, and all type
    parameters) and clarity: you state your intention firmly. I think
    a convenient syntax like this will encourage developers to use
    the powerful tool of sealed types to model their data.

    Evidence in Google’s Codebase

    If you are just interested in recommendations, you can stop
    reading now: they are all included in the summary. What follows
    is a number of anecdotes, or case studies if you prefer, that led
    me to the conclusions above. Each shows a type that might have
    been written as a sealed type, and hopefully highlights a
    different facet of the question of how sealed types can be useful.

    The first thing I looked for was classes which are often involved
    in instanceof checks. As language authors, we imagine people
    writing stuff like this[1] all the time:

    interface Expr {int eval(Scope s);}
    record Var(String name) implements Expr {
      public int eval(Scope s) {return s.get(name);}
    }
    record Sum(Expr left, Expr right) implements Expr {
      public int eval(Scope s) {return left.eval(s) + right.eval(s);}
    }
    class Analyzer {
      Stream<String> variablesUsed(Expr e) {
        if (e instanceof Var) return Stream.of(((Var)e).name);
        if (e instanceof Sum) {
          return variablesUsed(((Sum)e).left)
    .concat(variablesUsed(((Sum)e).right));
       }
        throw new IllegalArgumentException();
      }
    }

    Here, the Expr interface captures some of the functionality
    shared by all expressions, but later a client (Analyzer) came
    along and invented some other polymorphic operations to perform
    on an Expr, which Expr did not support. So Analyzer needed to do
    instanceof checks instead, externalizing the polymorphism. The
    principled approach would have been for Expr to export a visitor
    to begin with, but perhaps it wasn’t seen as worth the trouble at
    the time.

    To try to find this pattern in the wild, I searched for method
    bodies which perform multiple instanceof checks against the same
    variable. Notably, this excludes the typical equals(Object)
    method, which only performs a single check. For each such
    variable, I noted:

    1. Its declared type
    2. The set of subtypes it was checked for with instanceof
    3. The common supertype of those subtypes.

    I guessed that (3) would usually be the same as (1), but in
    practice 55% of the time they were different. Often, the declared
    type was Object, or some generic type variable which erases to
    Object, while the common supertype being tested was something
    like Number, Event, or Node. For example, a Container<T> knows it
    will be used in some context where NaN is unsuitable, so it
    checks whether its contents are Float or Double, and if so
    ensures NaN is not stored. As a second example, a
    serialize(Object) method checks whether its input is String or
    ByteString, and throws an exception otherwise.

    Bad sealed types found looking at instanceof checks

    I looked through the most popular declared types of these
    candidates, to investigate which types are often involved in such
    checks. Most of them are not good candidates for a sealed type.
    Object was the most common type, followed by Exception and Throwabe.

    Next up is an internal DOMObject class, which sounds promising
    until I tell you it has thousands of direct subclasses. Nobody is
    doing exhaustive switches on this, of course. Instead, many uses
    iterate over a Collection<DOMObject>, or receive a DOMObject in
    some way, and just check whether it is of one or two specific
    subtypes they care about. This turned out to be a very common
    pattern, not just for DOMObject, but for many candidate sealed
    types I found: nobody does exhaustive case analysis. They just
    look for objects they understand in some much larger hierarchy,
    and ignore the rest.

    Some more humorous types that are often involved in instanceof
    checks: java.net.InetAddress (everyone wants to know if it’s v4
    or v6) and com.sun.source.tree.Tree, in our static-analysis
    tools. Tree is an interesting case: here we do exactly what I
    mentioned previously for DOMObject. On the surface it seems that
    Tree would be a good candidate for a sealed interface with record
    subtypes, but in practice I’m not sure what sealing would buy us.
    We would effectively opt out of exhaustiveness-checking by having
    a large default case, or by extending a visitor with
    default-empty methods. Of course, sometimes we define a new
    visitor to do some polymorphic operation over a Tree, but more
    often we just look for one or two subtypes we care about. For
    example, DataFlow inspects a Tree, but knows from context that it
    is either a LambdaExpressionTree, MethodTree, or an initializer.

    Plausible sealed types found looking at instanceof checks

    The previous section notwithstanding, I did dig deep enough into
    the results to find a few classes that could make good sealed
    types. The most prominent, and most interesting, was another AST.
    There is an abstract Node class for representing HTML documents.
    It has just 4 subclasses defined in the same file: Text, Comment,
    Tag, and EndTag. This spartan definition suggests it’s used for
    something like SAX parsing, but I didn’t confirm this. It does
    everything you could hope for from a type like this: it exposes a
    Visitor, it provides an accept(Visitor) method, and the
    superclass specifies abstract methods for a couple of the most
    common things you would want to do, such as a String toHtml() method.

    However, recall that I found this class by looking for classes
    often involved in instanceof checks! Some people use the visitor,
    but why doesn’t everyone? The first reason I found is one I’ve
    mentioned several times already: clients only care about one of
    the 4 cases, and may have felt creating an anonymous visitor is
    too much ceremony. Would they be happy with a switch and a
    default clause? Probably, but it’s hard to know for sure. The
    second reason surprised me a bit: I found clients doing analysis
    that isn’t really amenable to any one visitor, or a simple
    pattern-match. They’ve written this:

    if (mode1) { if (x instanceof Tag) {...} }
    else if (mode2) { if (x instanceof Text) {...}}

    The same use site cares about different subclasses at different
    times, depending on some other flag(s) controlling its behavior.
    Even if we offered a pattern-match on x, it’s difficult to encode
    the flags correctly. They would have to match on a tuple of
    (mode1, mode2, x), with a case for (true, _, Tag name) and
    another for (false, true, Text text). Technically possible, but
    not really prettier than what they already have, especially since
    you would need to use a local record instead of an anonymous tuple.

    Even so, I think this would have benefited from being a sealed
    type. Recall that earlier I carefully said “4 subclasses defined
    in the same file”. This is because some jokester in a different
    package altogether has defined their own fifth subclass, Doctype.
    They have their own sub-interface of Visitor that knows about
    Doctype nodes. I can’t help but feel that the authors of Node
    would have preferred to make this illegal, if they had been able to.

    The second good sealed type I found is almost an enum, except
    that one of the instances has per-instance data. This is not
    exactly a surprise, since an enum is a degenerate sum type, and
    one way to think of sealed types is as a way to model sums. It
    looks something like this[2]:

    public abstract class DbResult<T> {
      public record NoDatabase<T>() extends DbResult<T>;
      public record RowNotFound<T>() extends DbResult<T>;
      // Four more error types ...
      public record EmptySuccess<T>() extends DbResult<T>;
      public record SuccessWithData<T>(T data) extends DbResult<T>;

      public T getData() {
        if (!(this instanceof SuccessWithData))
          throw new DbException();
        return ((SuccessWithData<T>)this).data;
      }
      public <U> DbResult<U> transform(Function<T, U> f) {
        if (!(this instanceof SuccessWithData)) {
          return (DbResult<U>)this;
        }
        return new SuccessWithData<U>(f.apply(
    ((SuccessWithData<T>)this).data));
    }

    Reading this code made me yearn for Haskell: here is someone who
    surely wanted to write

    data DbResult t = NoDatabase | NoRow | EmptySuccess | Success t

    but had to spend 120 lines defining their sum-of-products (the
    extra verbosity is because really they made the subclasses
    private, and defined private static singletons for each of the
    error types, with a static getter to get the type parameter
    right). This seems like a potential win for records and for
    sealed types. Certainly my snippet was much shorter than the
    actual source file because the proposed record syntax is quite
    concise, so that is a real win. But what do we really gain from
    sealing this type? Still nobody does exhaustive analysis even of
    this relatively small type: they just use functions like getData
    and transform to work with the result generically, or spot-check
    a couple interesting subtypes with instanceof. Forbidding
    subclassing from other packages hardly matters: nobody was
    subclassing it anyway, and nor would they be tempted to. Really
    the improvements DbResult benefits most from are records, and
    pattern-matching on records. It would be much nicer to replace
    the instanceof/cast pattern with a pattern-match that extracts
    the relevant field.

    This is the use case that inspired my idea of a type-enum, in the
    Summary section above. Rewriting it as a type-enum eliminates
    many of the problems: all the instanceof checks are gone, we
    don’t need a bunch of extra keywords for each case, and we’re
    explicit about the subclasses “belonging to” the sealed parent,
    which means we get stuff like extends and <T> for free. We get
    improved clarity by letting the definition of the class hierarchy
    reflect its “nature” as a sum.

    public abstract type-enum DbResult<T> {
      NoDatabase,
      RowNotFound,
      EmptySuccess,
      SuccessWithData(T data) {
        @Override public T getData() {
          return data;
        }
        @Override public <U> DbResult<U> transform(Function<T, U> f) {
          return new SuccessWithData<U>(f.apply(data));
        }
      }

      public T getData() {
        throw new DbException();
      }
      public <U> DbResult<U> transform(Function<T, U> f) {
        return (DbResult<U>)this;
      }
    }

    Visitors

    Instead of doing a bunch of instanceof checks, the
    “sophisticated” way to interact with a class having a small,
    known set of subtypes is with a visitor. I considered doing some
    complicated analysis to characterize what makes a class a
    visitor, and trying to automatically cross-reference visitors to
    the classes they visit...but in practice simply looking for
    classes with “Visitor” in their name was a strong enough signal
    that a more complicated approach was not needed. Having
    identified visitors, I looked at those visitors with the most
    subclasses, since each distinct subclass corresponds to one
    “interaction” with the sealed type that it visits, and well-used
    visitors suggest both popularity and good design.

    One common theme I found: developers aren’t good at applying the
    visitor pattern. Many cases I found had some weird and
    inexplicable quirk compared to the “standard” visitor. These
    developers will be relieved to get pattern-matching syntax so
    they can stop writing visitors.

    The Visiting Object

    The first popular visitor I found was a bit odd to me. It’s
    another tree type, but with a weird amalgam of several visitors,
    and an unusual approach to its double dispatch. I have to include
    a relatively lengthy code snippet to show all of its facets:

    public static abstract class Node {
      public interface Visitor {
        boolean process(Node node);
      }
      public boolean visit(Object v) {
        return v instanceof Visitor
            && ((Visitor)v).process(this);
      }
      // Other methods common to all Nodes ...
    }

    public static final class RootNode extends Node {
      public interface Visitor {
        boolean processRoot(RootNode node);
      }
      @Override
      public boolean visit(Object v) {
        return v instanceof Visitor
            ? ((Visitor)v).processRoot(this)
            : super.visit(v);
      }
      // Other stuff about root nodes ...
    }

    public static abstract class ValueNode extends Node {
      public interface Visitor {
        boolean processValue(ValueNode node);
      }
      @Override
      public boolean visit(Object v) {
        return v instanceof Visitor
            ? ((Visitor)v).processValue(this)
            : super.visit(v);
      }
    }

    public static final class BooleanNode extends ValueNode {
      public interface Visitor {
        boolean processBool(BooleanNode node);
      }
      @Override
      public boolean visit(Object v) {
        return v instanceof Visitor
            ? ((Visitor)v).processBool(this)
            : super.visit(v);
      }
      // Other stuff about booleans ...
    }

    public static final class StringNode extends ValueNode {
      // Much the same as BooleanNode
    }

    This goes on for some time: there is a multi-layered hierarchy of
    dozens of node types, each with a boolean visit(Object) method,
    and their own distinct Visitor interface, in this file. I should
    note that this code is actually not written by a human, but
    rather generated by some process (I didn’t look into how). I
    still think it is worth mentioning here for two reasons: first,
    whoever wrote the code generator would probably do something
    similar if writing it by hand, and second because these visitors
    are used often by hand-written code.

    Speaking of hand-written code, visitor subclasses now get to
    declare ahead of time exactly which kinds of nodes they care
    about, by implementing only the appropriate Visitor interfaces:

    private class FooVisitor implements StringNode.Visitor,
     BooleanNode.Visitor, RootNode.Visitor {
      // ...
    }

    This isn’t how I would have written things, but I can sorta see
    the appeal, if you don’t have to write it all by hand: a visitor
    can choose to handle any one subclass of ValueNode, or all
    ValueNodes, or just RootNode and StringNode, et cetera. They get
    to pick and choose what sub-trees of the inheritance tree they
    work with.

    Would Node be a good sealed class? Maybe. It clearly intends to
    enumerate all subclasses, but the benefit it gets from enforcing
    that is minimal. As in my previous examples, the main advantage
    for Node implementors would come from records, and the main
    advantage for clients would come from pattern-matching, obviating
    their need for this giant visitor.

    The Enumerated Node

    Another AST, this time for some kind of query language,
    explicitly declares an enum of all subclasses it can have, and
    uses this enum instead of using traditional double-dispatch:

    public interface Node {
      enum Kind {EXPR, QUERY, IMPORT /* and 9 more */}
      Kind getKind();
      Location getLocation();
    }

    public abstract record AbstractNode(Location l) implements Node {}

    public class Expr extends AbstractNode {
      public Kind getKind() {return EXPR;}
      // ...
    }
    // And so on for other Kinds ...

    public abstract class Visitor {
      // Empty default implementations, not abstract.
      public Expr visitExpr(Expr e) {}
      public Query visitQuery(Query q) {}
      public Import visitImport(Import i) {}
      public Node visit(Node n) {
        switch (n.getKind()) {
          case EXPR: return visitExpr((Expr)n);
          case QUERY: return visitQuery((Query)n);
          case IMPORT: return visitImport((Import)n);
          // ...
        }
      }
    }

    It’s not really clear to me why they do it this way, instead of
    putting an accept(Visitor) method on Node. They gain the ability
    to return different types for each Node subtype, but are hugely
    restricted in what visitors can do: they must return a Node,
    instead of performing an arbitrary computation. It seems like the
    idea is visitors must specialize to tree rewriting, but I still
    would have preferred to parameterize the visitor by return type.

    Would this be better as a sealed type? I feel sure that if sealed
    types existed, the authors of this class would have used one. We
    could certainly do away with the enum, and use an
    expression-switch instead to pattern-match in the implementation
    of visit(Node). But I think the Visitor class would still exist,
    and still have separate methods for each Node subtype, because
    they developer seemed to care about specializing the return type.
    The only place where an exhaustiveness check helps would be in
    the visit(Node) method, inside the visitor class itself. All
    other dispatch goes through visit(Node), or through one of the
    specialized visitor methods if the type is known statically. It
    seems like overall this would be an improvement, but again, the
    improvement comes primarily from pattern-matching, not sealing.

    Colocated interface implementations

    Finally, I looked for interfaces having all of their
    implementations defined in the same file. On this I do have some
    statistical data[3]. A huge majority (98.5%) of public interfaces
    have at least one implementation in a different source file.
    Package-private interfaces also tend to have implementations in
    other files: 85% of them are in this category. For protected
    interfaces it’s much closer: only 53% have external
    implementations. Of course, all private interfaces have all
    implementations in a single file.

    Next, I looked at interfaces that share a source file with all
    their implementations, to see whether they’d make good sealed
    types. First was this Entry class:

    public interface Entry {
      enum Status {OK, PENDING, FAILED}
      Status getStatus();
      int size();
      String render();
    }

    public class UserEntry implements Entry {
      private User u;
      private Status s;
      public UserEntry(User u, Status s) {
        this.u = u;
        this.s = s;
      }
      @Override String render() {return u.name <http://u.name/>();}
      @Override int size() {return 1;}
      @Override Status getStatus() {return s;}
    }

    public class AccountEntry implements Entry {
      private Account a;
      private Status s;
      public UserEntry(Account a, Status s) {
        this.a = a;
        this.s = s;
      }
      @Override String render() {return a.render();}
      @Override int size() {return a.size();}
      @Override Status getStatus() {return s;}
    }

    A huge majority of the clients of this Entry interface treat it
    polymorphically, just calling its interface methods. In only one
    case is there an instanceof check made on an Entry, dispatching
    to different methods depending on which subclass is present.

    Is this a good sealed type? I think not, really. There are two
    implementations now, but perhaps there will be a GroupEntry
    someday. Existing clients should continue to work in that case:
    the polymorphic Entry interface provides everything clients are
    “intended” to know.

    Another candidate for sealing:

    public interface Request {/* Empty */}
    public record RequestById(int id) implements Request;
    public record RequestByValue(String owner, boolean historic)
    implements Request;

    public class RequestFetcher {
      public List<Item> fetch(Iterable<Request> requests) {
        List<RequestById> idReqs = Lists.newArrayList();
        List<RequestByValue> valueReqs = Lists.newArrayList();
        List<Query> queries = Lists.newArrayList();
        for (Request req : requests) {
          if (req instanceof RequestById) {
            idReqs.add((RequestById)req);
          } else if (req instanceof RequestByValue) {
    valueReqs.add((RequestByValue)req);
          }
        }
    queries.addAll(prepareIdQueries(idReqs));
    queries.addAll(prepareValueQueries(valueReqs));
        return runQueries(queries);
      }
    }

    Interestingly, since the Request interface is empty, the only way
    to do anything with this class is to cast it to one
    implementation type. In fact, the RequestFetcher I include here
    is the only usage of either of these classes (plus, of course,
    helpers like prepareIdQueries).

    So, clients need to know about specific subclasses, and want to
    be sure they’re doing exhaustive pattern-matching. Seems like a
    great sealed class to me. Except...actually each of the two
    subclasses has been extended by a decorator adding a source[4]:

    public record SourcedRequestById(Source source) extends RequestById;
    public record SourcedRequestByValue(Source source) extends
    RequestByValue;

    Does this argue in favor of sealing, or against? I don’t really
    know. The owners of Request clearly intended for all four of
    these subclasses to exist (they’re in the same package), so they
    could include them all in the permitted subtype list, but it
    seems like a confusing API to expose to clients.

    A third candidate for sealing is another simple sum type:

    public interface ValueOrAggregatorException<T> {
      T get();
      public static <T> ValueOrAggregatorException<T>
        of(T value) {
        return new OfValue<T>(value);
      }
      public static <T> ValueOrAggregatorException<T>
          ofException(AggregatorException err) {
        return new OfException<T>(err);
      }
      private record OfValue<T>(T value)
          implements ValueOrAggregatorException<T> {
        @Override T get() {return value;}
      }
      private record OfException<T>(AggregatorException err)
          implements ValueOrAggregatorException<T> {
        @Override T get() {throw err;}
      }
    }

    It has only two subtypes, and it seems unimaginable there could
    ever be a third, so why not seal it? However, the subtypes are
    intentionally hidden: it is undesirable to let people see whether
    there’s an exception, except by having it thrown at you. In fact
    AggregatorException is documented as “plugins may throw this, but
    should never catch it”: there is some higher-level thing
    responsible for catching all such exceptions. So, this type gains
    no benefit from exhaustiveness checks in pattern-matching. The
    type is intended to be used polymorphically, through its
    interface method, even though its private implementation is
    amenable to sealing.
    ________________
    [1] Throughout this document I will use record syntax as if it
    were already in the language. This is merely for brevity, and to
    avoid making the reader spend a lot of time reading code that
    boils down to just storing a couple fields. In practice, of
    course the code in Google’s codebase either defines the records
    by hand, or uses an @AutoValue.
    [2] Recall that @AutoValue, Google’s “record”, allows extending a
    class, which is semantically okay here: DbResult has no state,
    only behavior.
    [3]This data is imperfect. While the Google codebase strongly
    discourages having more than one version of a module checked in,
    there is still some amount of “vendoring” or checking in multiple
    versions of some package, e.g. for supporting external clients of
    an old version of an API. As a result, two “different files”
    which are really copies of each other may implement interfaces
    with the same fully-qualified name; I did not attempt to control
    for this case, and so such cases may look like they were in the
    same file, or not.
    [4] Of course in the record proposal it is illegal to extend
    records like this; in real life these plain data carriers are
    implemented by hand as ordinary classes, so the subtyping is legal.

Re: Feedback on Sealed Types

Reply via email to