Thanks Gavin.  We had an internal discussion on this today, which I will summarize here to help illuminate the issue.

As Brian mentioned in an earlier email, sealed types address two related, but
distinct, issues: (1) declaring a sum type, whereby the compiler can exploit
exhaustiveness in various places (e.g. in a switch); and (2) defining a type
that for clients behaves as if it is final (it cannot be extended), but for
the class author actually has a fixed, known collection of implementations.

It would only be a moderate exaggeration to say that this is really two features -- which brings us to the classic lump/split decision -- should we expose this as one feature, or two.  (In the past, Alan M proposed a splitting where the sum-hierarchy case looked more like an enum declaration, for example.)  We have been leaning to "one", but there is a risk that attempting to cover both cases makes each of them harder to understand.

The second use dictates a subtle design constraint: a type that directly
extends/implements a `sealed` type must be either `sealed`, `final` or
`non-sealed`. If not, it will be too easy to create a security hole where the
class author intended a class hierarchy to be closed, but by forgetting a
modifier at a leaf type, inadvertently renders the hierarchy open.

This is our line in the sand; it would not be OK to have an arbitrary subtype `class X implements I { }` for some sealed I, and have X end up being open for extension.

One valid design point is to stop here. All `sealed`/`non-sealed`/`final`
modifiers and `permits` clauses have to be given explicitly. The compiler then
just checks that what has been declared is correct.

Indeed, we could stop here; let's call this our baseline.  If we stopped here, we'd get the desired safety benefits, explicitness, and a reasonable lumping.  Let's be more explicit about why we might do more.

If we declare a sum hierarchy, there are three unfortunate bits of O(n) repetition:

    sealed interface X permits A, B, C {
        final class A implements X {}
        final class B implements X {}
        final class C implements X {}
}

These three bits are: listing the subtypes twice (once at the declaration, once at the permits clause); saying "implements X" repeatedly; and saying "final" repeatedly.  (In the event the subtypes are records, the last is automatically taken care of.)

We have been exploring some alternative design points, all supporting some
sort of inference.

The inference scheme Gavin proposed addresses the first and last of these, at some cost (both to perceived complexity and implementation.)  Let's drill into why we proposed this in the first place.

The case I have in mind -- which I believe will be quite common -- is a flat hierarchy (one sealed supertype, N direct subtypes) with a relatively high degree of fan-out.  (This shows up in all sorts of document tree representations.)  And of the three repetitions, my claim is the most irritating is the permits clause.  Imagine the above hierarchy fanned out to A-Z; there's a 26-way permits clause that is both annoying to write and not particularly enlightening to read (and as a bonus, error-prone to update.)  If we're going to infer anything, we should start here.

So one sensible increment atop the baseline is: (A) if a top-level type is explicitly declared sealed, and has no permits clause, we can infer the permits clause from the subtypes in the compilation unit (or more narrowly, the _nested_ subtypes).  This is simpler than the full scheme outlined, while addressing the biggest case that concerns me -- the high-fanout case.

The above scheme could be incrementally extended to (B) any class explicitly marked `sealed` -- if you say sealed and leave off `permits`, the permits list is inferred from the current compilation unit.  This seems defensible.

Where we end up with confusing action-at-a-distance is to infer finality / sealed-ness for subtypes.  We could back off from this completely, or we could take a simpler projection, which is (C) to say any direct subtype of a sealed type is implicitly final, unless it explicitly says `sealed` or `non-sealed`.

So, if we're lumping, we could choose "baseline", or "baseline + A", or "baseline + A + B", or "baseline + A + B + C".  All seem defensible, though baseline-only seems likely to provide some ongoing irritation for the permits clause.


If we go the split direction, we have more choices, but having chased a few of these down, they all seem to arrive at muddy places.  Alan's `enum class` approach looks clean when the components of the sum are simple records, but when they are more complex classes, the expression starts to get ugly.

Another direction is to borrow the terminology (but not the semantics) from `case` classes in Scala, where we explicitly mark the subtypes, which has the effect of turning on all the complex inference, but at least making it less magic:

    sealed interface I {
        case class A { ... }
        case class B { ... }
        case class C { ... }
    }

where we'd infer the proper sealed-ness/finality, permits clauses, and implements clauses.  But, I think when we get beyond toy examples, this approach feels unlikely to offer sufficient benefit to justify itself, and the toy examples work reasonably well already (since records are already final.)


So my suggestion is to start with Baseline + (A | A&B), limiting inference to permits clauses, and see if that is enough.


Reply via email to