We're zeroing in on a feature list for the next iteration of pattern matching, which includes:

 - New kinds of patterns
   - Array patterns
   - Record deconstruction patterns, including varargs records
   - maybe: guard patterns (true(e), false(e))
   - maybe: AND patterns (P & Q), syntax TBD

 - New place for patterns: switch
   - Extend switch to support more types (all types?)
   - Extend switch on non-legacy types to support patterns as case labels
   - New nullity behavior (including case null)
   - New scoping + fallthrough
   - Refined exhaustiveness analysis

This document is an overview; the immediate goal is coming to consensus that this is the set of features we should be targeting next.  (Please, no deep-dives into details of, say, translation until this is locked down.)


#### Array patterns

An array pattern has the form
    T[] { CommaSeparatedListOfPatternsWithOptionalTrailingDots } OptionalIdentifier

The target type of an array pattern is T[], and it matches if:
 - the target is non-null
 - the target is an instance of T[]
 - the target has has an length equal to the arity of the nested pattern list, or, if the nested pattern list ends with a `...`, length greater than the arity of the nested pattern list
 - for each nested pattern P_i, the i'th element of the list matches P_i

It binds the union of the bindings of the nested patterns.  Like deconstruction patterns, it may optional bind a name to the target cast to the right type, if a name is provided:

    case T[] { P, Q } ts:

For multi-dimensional arrays, we interpret the pattern

    T[][] { P, Q }

as matching P and Q to the first and second elements of the outermost array:

    case T[][] { T[] a, T[] b }
or
    case T[][] { T[] { var a, var b }, T[] { var c, ... } }

Note that `T[] ts` is an ordinary type pattern, not an array pattern.


#### Record patterns

A record pattern has the form

    TypeName( CommaSeparatedListOfPatternsWithOptionalTrailingDots ) OptionalIdentifier

The type name must be a record class.  It matches if:

 - the target is non-null
 - the target is an instance of the specified record type
 - each of the nested patterns matches the corresponding component of the record, as served by the accessor method

The binding list is the union of the binding lists of the nested patterns; if the optional identifier is specified, it is bound to the target, cast to TypeName.

The treatment of null components is as per the matching rules for the nested pattern; for type patterns, if the type is total on (a supertype of) the type of the component, it is presumed to match, otherwise a dynamic (instanceof) check is performed.


#### Guard patterns

A guard pattern has the form true(BooleanExpr) or false(BooleanExpr).  It matches if the boolean expression is true or false, respectively.  It matches any target type and binds no bindings.


#### AND patterns

Two patterns P and Q can be combined into an AND pattern denoted as P&Q (syntax provisional.)  P and Q must both be applicable to the target type for P&Q to be; the binding list is the union of the binding lists.  If P does not match the target, no attempt is made to match Q.


#### Translation

I'll write VARS(P) to describe the binding variables (name and type) of a pattern P, and TR(target, P) to describe the compiler-tree desugaring of a pattern match expression.  When translating a pattern, its bindings are hoisted out to a suitable scope.  Javac has as "LetExpr" tree type, `LetExpr{S*} in e`, which means we execute statements S* and evaluate expression `e`, and its result is the result of evaluating `e`.  union_i is the union over a set indexed by i; product_i is the &&-product over the set.  |S| is the cardinality of a set.

Type patterns:
    VARS(T t) = { T t }
    TR(target, T t) = true   // for statically total type patterns, otherwise:
    TR(target, T t) = LetExpr {
                        boolean b = target instanceof T;
                        if (b) t = (T) target;
                      } in b

Record patterns:
    VARS(R(P*)) = union_i VARS(P_i)
    TR(target, R(P*)) = LetExpr {
                    boolean b = target instanceof T;
                    R res = b ? (R) target : null;
                } in (b && product_i TR(res.comp_i(), P_i))

Array patterns:
    VARS(T[] { P* }) = union_i VARS(P_i)
    TR(target, T[] { P* }) = LetExpr {
                        boolean b = target instanceof T;
                        T[] res = b ? (T[]) target : null;
                    } in (b && res.length == |P*| && product_i TR(res[i], P_i))
    TR(target, T[] { P*, ... }) = LetExpr {
                        boolean b = target instanceof T;
                        T[] res = b ? (T[]) target : null;
                    } in (b && res.length >= |P*| && product_i TR(res[i], P_i))

Guard patterns:
    VARS(true(e)) = { }
    VARS(false(e)) = { }
    TR(true(e)) = e
    TR(false(e)) = !e

AND patterns:
    VARS(P&Q) = VARS(P) union VARS(Q)
    TR(P&Q) = TR(P) && TR(Q)

The above represents an unrolling of nested patterns, where t~P(Q) is unrolled into t~P(a) && a~Q (where t~P means "P matches t").

#### Binary compatibility

In the future, it will be possible to explicitly declare deconstruction patterns in records.  In order to make adding an explicit canonical deconstruction pattern a binary-compatible operation, by the time we exit preview, some of this logic should be funneled through a condy-based pattern runtime so that old binaries that ask for the canonical deconstruction pattern for a record will get a reasonable default if no explicit one exists, or the one in the class file if one does.  This entails replacing `res.comp_i()` in the translation with calls to the pattern runtime.


## Patterns in switch

Introducing patterns into case labels creates another round of adjustments for switch, and probably some new seams between old and new functionality.

#### More types

Switch is currently limited to integral primitive types and their boxes, strings, and enums.  We'll these types "legacy switch types", and call switches on legacy switch types, for which all of whose case labels are "old-style" case labels, "legacy switches". (I'll develop the list of candidate restrictions on legacy switches as I go; ideally this list should be empty, but it probably won't be.)

Pattern matching surely lets us switch over any reference type. It is an open question whether we should fill out the three remaining primitive types (boolean, float, double.)  On the one hand, it seems oddly incomplete to be able to switch over all but three types, while on the other, switching over floats seems pretty niche, and we already have a statement for switch over booleans (`if`).

Assuming we decide for completeness, there are two interpretations of switching on float: whether this is a legacy or pattern switch.

Note that for nested patterns, we need to have type patterns for all the primitive types, so we can match against `Foo(boolean b)` or `Bar(float f)`, even if we don't allow switching on these types.

#### Patterns as case labels

The main change here, of course, is allowing patterns as case labels.  The specified pattern must be applicable to the static type of the switch operand.

One question we need to answer is how to integrate legacy case labels into the story.  There are basically three options here:

 - Retcon them.  In this approach, we define a constant literal to be a constant pattern.  Then, all switches can be interpreted as pattern switches.

 - Tolerated legacy.  With `instanceof`, rather than defining a bare type to be a pattern, we defined two forms of the `instanceof` production, a legacy one (`instanceof T`) and a pattern one.  We can do the same with case labels, and possibly then say that legacy case labels are only allowed in legacy switches.

 - Leave them to rot.  In this approach, legacy case labels are allowed on legacy switches only.

The difference between the first (retcon them as constant patterns) is whether literals are valid as patterns in contexts other than `case` labels (instanceof, nested patterns.)  I am steadily converging on the conclusion that constant patterns are an "obvious but wrong" idea, more "cute and clever" than sound language design.

Note that with a sensible guard feature, constant patterns become pure syntactic shortcut; `Foo(0)` can be expressed as `Foo(var x) & true(x == 0)` in any pattern context.  That's not to say that the benefit of the shortcut is zero, but there's also a cost; further user confusion as to whether `id(0)` is an invocation of a method with a constant, or matching a pattern with a nested constant pattern.  (If we decide we really want constant patterns because the nestability leads to better readability, I prefer we find another way to denote them, such as `== 0` or `const(0)` or such.)

So, let's assume that we're going to be somewhere on the spectrum between "tolerate" and "rot".  There are two questions to answer here:

 - In switches on legacy types, do we allow a mix of constant cases and pattern cases?  (Note that this only makes a difference when we have more interesting patterns that can apply to these types, which mostly means waiting for declared patterns.)

 - In non-legacy-switches, do we allow constant cases at all?

If we say "yes" to either of these, then essentially we treat any pattern with a mix of these as a pattern switch, and lower `case C` to `case TargetType t & true(t.equals(C))` or similar.

#### Nullity

We've gone through an extensive discussion (actually, many extensive discussions) on nullity in switch.  It took several rounds to get over the "why can't we just always throw on null", but I hope we're there now.  It seems that the sensible strategy is:

 - At the top level of a switch, total type patterns, and the new `case null`, accept null, and no others.  (Default does not accept null, but you can fall out of a `case null` into a default.)  If a switch has no null-accepting patterns, then the switch throws NPE preemptively (simulating current behavior), as if there is an invisible trailing (possibly dead) `case null: NPE` in all switches.

This strategy is fully backward compatible; anything that throws today, will throw tomorrow.  Total patterns must come last, so only the last pattern, or an explicit `case null`, will see a null.

For nested patterns, we've already defined the semantics of pattern matching such that pattern matching never NPEs, and never introduces any failures to match null that don't derive from an underlying dynamic type test.  The problem is solely with expectations having to do with the top level of switch, based on historical preemptive null checks.

#### Scoping and fallthrough

Having decided to backtrack on merging binding variables in instanceof (`x instanceof Foo(var a) || x instanceof Bar(var a))`), the corresponding backtracking is to restrict fallthrough such that you cannot fall into, or out of, a case clause that has bindings.

This may require some adjustment in the scoping of binding variables, which is probably not yet fully defined for switches, to ensure that their scope ends at the next case declaration, so that we don't have a problem with reusing names:

    case Foo(var x): break;
    case Bar(var x): // OK to use `x` here

It is OK to fall out out of a case without bindings into a case without bindings, such as falling out of `case null` into `default`.

#### Exhaustiveness

There's a definition of a set of patterns being total on a target type, outlined here: https://github.com/openjdk/amber-docs/blob/master/site/design-notes/type-patterns-in-switch.md.

The main new feature here is allowing pattern sets that are (optimistically) total on the target to be considered total in a switch expression, and insert the appropriate throwing catch-call to handle the remainder.

Secondarily, we still have a question to confront about how to make statement switches total like expression switches, so that users can opt into stricter type checking.

#### Translation considerations

Legacy switches on primitives can continue to get translated as they are now.

Pattern switches are lowered to densely numbered int switches, and we use an indy-based classifier to select the candidate case.  The bootstrap takes as its static argument list a list of Class objects (or null), and as its dynamic arguments, the switch target and the case number to start at.  For example, we translate:

    switch (o) {
        case Telegram t: A
        case Integer i && true(i > 0): B
        case Integer i: C
        case Foo(Bar(var x)): D
        case "moose": ...
    }

as (pseudocode):

    int index = -1;
    LOOP:
    while (true) {
        index = indy[bootstrap=SwitchBootstrap[Telegram.class, Integer.class, Integer.class, Foo.class, String.class]](o, index);
        switch (idx) {
            case 0: A
            case 1:
                if (!(i > 0))
                    continue LOOP;
                B;
            case 2: C;
            case 3:
                if (!(fooBinding1 instanceof Bar(var x))
                    continue LOOP;
                D
            case 4:
                if (!(o.equals("moose"))
                    continue LOOP;
        }
        break LOOP;
    }

The classifier builds a decision tree based on types, but does not include nested patterns or guards -- these are added by the desugaring.  If there is anything that causes us to enter a case and then later decide "whoops", we break out of the switch, and re-run the classifier, starting at the last index we tried. (Earlier experiments suggested this was the sweet spot; while we can accurately model patterns and guards as constant bundles of method handles, the complexity is high and the benefit is not so high.

For record/array patterns, we use the classifier to select the type, and then unroll any nested patterns into the switch body.  For ANDed patterns, we classify on the type of the first pattern and unroll the rest into the switch body.

For legacy switches on enums and strings, we have the option of replacing the current desugaring with an indy-based classification as we do with pattern switches, to move complex desugaring code from the compiler to simpler runtime code.  We can do this now, later, or never.


#### Open issues

We have a few open issues to sync on, though almost all can probably be kicked down the road past first preview:

 - Boundaries between legacy switches and pattern switches
   - allow mixing of constant and pattern cases?
   - allow switch on float/double/boolean with constant labels?
 - Allow statement switches to opt into totality?  How?


Reply via email to