Pattern features for next iteration

Brian Goetz Tue, 19 Jan 2021 11:59:34 -0800

We're zeroing in on a feature list for the next iteration of patternmatching, which includes:


 - New kinds of patterns
   - Array patterns
   - Record deconstruction patterns, including varargs records
   - maybe: guard patterns (true(e), false(e))
   - maybe: AND patterns (P & Q), syntax TBD


 - New place for patterns: switch
   - Extend switch to support more types (all types?)
   - Extend switch on non-legacy types to support patterns as case labels
   - New nullity behavior (including case null)
   - New scoping + fallthrough
   - Refined exhaustiveness analysis

This document is an overview; the immediate goal is coming to consensusthat this is the set of features we should be targeting next. (Please,no deep-dives into details of, say, translation until this is locked down.)



#### Array patterns

An array pattern has the form

T[] { CommaSeparatedListOfPatternsWithOptionalTrailingDots }OptionalIdentifier


The target type of an array pattern is T[], and it matches if:
 - the target is non-null
 - the target is an instance of T[]

- the target has has an length equal to the arity of the nestedpattern list, or, if the nested pattern list ends with a `...`, lengthgreater than the arity of the nested pattern list

 - for each nested pattern P_i, the i'th element of the list matches P_i

It binds the union of the bindings of the nested patterns. Likedeconstruction patterns, it may optional bind a name to the target castto the right type, if a name is provided:


    case T[] { P, Q } ts:

For multi-dimensional arrays, we interpret the pattern

    T[][] { P, Q }

as matching P and Q to the first and second elements of the outermost array:

    case T[][] { T[] a, T[] b }
or
    case T[][] { T[] { var a, var b }, T[] { var c, ... } }

Note that `T[] ts` is an ordinary type pattern, not an array pattern.


#### Record patterns

A record pattern has the form

TypeName( CommaSeparatedListOfPatternsWithOptionalTrailingDots )OptionalIdentifier


The type name must be a record class.  It matches if:

 - the target is non-null
 - the target is an instance of the specified record type

- each of the nested patterns matches the corresponding component ofthe record, as served by the accessor method

The binding list is the union of the binding lists of the nestedpatterns; if the optional identifier is specified, it is bound to thetarget, cast to TypeName.

The treatment of null components is as per the matching rules for thenested pattern; for type patterns, if the type is total on (a supertypeof) the type of the component, it is presumed to match, otherwise adynamic (instanceof) check is performed.



#### Guard patterns

A guard pattern has the form true(BooleanExpr) or false(BooleanExpr). It matches if the boolean expression is true or false, respectively. Itmatches any target type and binds no bindings.



#### AND patterns

Two patterns P and Q can be combined into an AND pattern denoted as P&Q(syntax provisional.) P and Q must both be applicable to the targettype for P&Q to be; the binding list is the union of the binding lists. If P does not match the target, no attempt is made to match Q.



#### Translation

I'll write VARS(P) to describe the binding variables (name and type) ofa pattern P, and TR(target, P) to describe the compiler-tree desugaringof a pattern match expression. When translating a pattern, its bindingsare hoisted out to a suitable scope. Javac has as "LetExpr" tree type,`LetExpr{S*} in e`, which means we execute statements S* and evaluateexpression `e`, and its result is the result of evaluating `e`. union_iis the union over a set indexed by i; product_i is the &&-product overthe set. |S| is the cardinality of a set.


Type patterns:
    VARS(T t) = { T t }

TR(target, T t) = true // for statically total type patterns,otherwise:

    TR(target, T t) = LetExpr {
                        boolean b = target instanceof T;
                        if (b) t = (T) target;
                      } in b

Record patterns:
    VARS(R(P*)) = union_i VARS(P_i)
    TR(target, R(P*)) = LetExpr {
                    boolean b = target instanceof T;
                    R res = b ? (R) target : null;
                } in (b && product_i TR(res.comp_i(), P_i))

Array patterns:
    VARS(T[] { P* }) = union_i VARS(P_i)
    TR(target, T[] { P* }) = LetExpr {
                        boolean b = target instanceof T;
                        T[] res = b ? (T[]) target : null;

} in (b && res.length == |P*| && product_iTR(res[i], P_i))

    TR(target, T[] { P*, ... }) = LetExpr {
                        boolean b = target instanceof T;
                        T[] res = b ? (T[]) target : null;

} in (b && res.length >= |P*| && product_iTR(res[i], P_i))


Guard patterns:
    VARS(true(e)) = { }
    VARS(false(e)) = { }
    TR(true(e)) = e
    TR(false(e)) = !e

AND patterns:
    VARS(P&Q) = VARS(P) union VARS(Q)
    TR(P&Q) = TR(P) && TR(Q)

The above represents an unrolling of nested patterns, where t~P(Q) isunrolled into t~P(a) && a~Q (where t~P means "P matches t").


#### Binary compatibility

In the future, it will be possible to explicitly declare deconstructionpatterns in records. In order to make adding an explicit canonicaldeconstruction pattern a binary-compatible operation, by the time weexit preview, some of this logic should be funneled through acondy-based pattern runtime so that old binaries that ask for thecanonical deconstruction pattern for a record will get a reasonabledefault if no explicit one exists, or the one in the class file if onedoes. This entails replacing `res.comp_i()` in the translation withcalls to the pattern runtime.



## Patterns in switch

Introducing patterns into case labels creates another round ofadjustments for switch, and probably some new seams between old and newfunctionality.


#### More types

Switch is currently limited to integral primitive types and their boxes,strings, and enums. We'll these types "legacy switch types", and callswitches on legacy switch types, for which all of whose case labels are"old-style" case labels, "legacy switches". (I'll develop the list ofcandidate restrictions on legacy switches as I go; ideally this listshould be empty, but it probably won't be.)

Pattern matching surely lets us switch over any reference type. It is anopen question whether we should fill out the three remaining primitivetypes (boolean, float, double.) On the one hand, it seems oddlyincomplete to be able to switch over all but three types, while on theother, switching over floats seems pretty niche, and we already have astatement for switch over booleans (`if`).

Assuming we decide for completeness, there are two interpretations ofswitching on float: whether this is a legacy or pattern switch.

Note that for nested patterns, we need to have type patterns for all theprimitive types, so we can match against `Foo(boolean b)` or `Bar(floatf)`, even if we don't allow switching on these types.


#### Patterns as case labels

The main change here, of course, is allowing patterns as case labels. The specified pattern must be applicable to the static type of theswitch operand.

One question we need to answer is how to integrate legacy case labelsinto the story. There are basically three options here:

- Retcon them. In this approach, we define a constant literal to be aconstant pattern. Then, all switches can be interpreted as patternswitches.

- Tolerated legacy. With `instanceof`, rather than defining a baretype to be a pattern, we defined two forms of the `instanceof`production, a legacy one (`instanceof T`) and a pattern one. We can dothe same with case labels, and possibly then say that legacy case labelsare only allowed in legacy switches.

- Leave them to rot. In this approach, legacy case labels are allowedon legacy switches only.

The difference between the first (retcon them as constant patterns) iswhether literals are valid as patterns in contexts other than `case`labels (instanceof, nested patterns.) I am steadily converging on theconclusion that constant patterns are an "obvious but wrong" idea, more"cute and clever" than sound language design.

Note that with a sensible guard feature, constant patterns become puresyntactic shortcut; `Foo(0)` can be expressed as `Foo(var x) & true(x ==0)` in any pattern context. That's not to say that the benefit of theshortcut is zero, but there's also a cost; further user confusion as towhether `id(0)` is an invocation of a method with a constant, ormatching a pattern with a nested constant pattern. (If we decide wereally want constant patterns because the nestability leads to betterreadability, I prefer we find another way to denote them, such as `== 0`or `const(0)` or such.)

So, let's assume that we're going to be somewhere on the spectrumbetween "tolerate" and "rot". There are two questions to answer here:

- In switches on legacy types, do we allow a mix of constant cases andpattern cases? (Note that this only makes a difference when we havemore interesting patterns that can apply to these types, which mostlymeans waiting for declared patterns.)


 - In non-legacy-switches, do we allow constant cases at all?

If we say "yes" to either of these, then essentially we treat anypattern with a mix of these as a pattern switch, and lower `case C` to`case TargetType t & true(t.equals(C))` or similar.


#### Nullity

We've gone through an extensive discussion (actually, many extensivediscussions) on nullity in switch. It took several rounds to get overthe "why can't we just always throw on null", but I hope we're therenow. It seems that the sensible strategy is:

- At the top level of a switch, total type patterns, and the new `casenull`, accept null, and no others. (Default does not accept null, butyou can fall out of a `case null` into a default.) If a switch has nonull-accepting patterns, then the switch throws NPE preemptively(simulating current behavior), as if there is an invisible trailing(possibly dead) `case null: NPE` in all switches.

This strategy is fully backward compatible; anything that throws today,will throw tomorrow. Total patterns must come last, so only the lastpattern, or an explicit `case null`, will see a null.

For nested patterns, we've already defined the semantics of patternmatching such that pattern matching never NPEs, and never introduces anyfailures to match null that don't derive from an underlying dynamic typetest. The problem is solely with expectations having to do with the toplevel of switch, based on historical preemptive null checks.


#### Scoping and fallthrough

Having decided to backtrack on merging binding variables in instanceof(`x instanceof Foo(var a) || x instanceof Bar(var a))`), thecorresponding backtracking is to restrict fallthrough such that youcannot fall into, or out of, a case clause that has bindings.

This may require some adjustment in the scoping of binding variables,which is probably not yet fully defined for switches, to ensure thattheir scope ends at the next case declaration, so that we don't have aproblem with reusing names:


    case Foo(var x): break;
    case Bar(var x): // OK to use `x` here

It is OK to fall out out of a case without bindings into a case withoutbindings, such as falling out of `case null` into `default`.


#### Exhaustiveness

There's a definition of a set of patterns being total on a target type,outlined here:https://github.com/openjdk/amber-docs/blob/master/site/design-notes/type-patterns-in-switch.md.

The main new feature here is allowing pattern sets that are(optimistically) total on the target to be considered total in a switchexpression, and insert the appropriate throwing catch-call to handle theremainder.

Secondarily, we still have a question to confront about how to makestatement switches total like expression switches, so that users can optinto stricter type checking.


#### Translation considerations

Legacy switches on primitives can continue to get translated as they arenow.

Pattern switches are lowered to densely numbered int switches, and weuse an indy-based classifier to select the candidate case. Thebootstrap takes as its static argument list a list of Class objects (ornull), and as its dynamic arguments, the switch target and the casenumber to start at. For example, we translate:


    switch (o) {
        case Telegram t: A
        case Integer i && true(i > 0): B
        case Integer i: C
        case Foo(Bar(var x)): D
        case "moose": ...
    }

as (pseudocode):

    int index = -1;
    LOOP:
    while (true) {

index = indy[bootstrap=SwitchBootstrap[Telegram.class,Integer.class, Integer.class, Foo.class, String.class]](o, index);

        switch (idx) {
            case 0: A
            case 1:
                if (!(i > 0))
                    continue LOOP;
                B;
            case 2: C;
            case 3:
                if (!(fooBinding1 instanceof Bar(var x))
                    continue LOOP;
                D
            case 4:
                if (!(o.equals("moose"))
                    continue LOOP;
        }
        break LOOP;
    }

The classifier builds a decision tree based on types, but does notinclude nested patterns or guards -- these are added by the desugaring. If there is anything that causes us to enter a case and then laterdecide "whoops", we break out of the switch, and re-run the classifier,starting at the last index we tried. (Earlier experiments suggested thiswas the sweet spot; while we can accurately model patterns and guards asconstant bundles of method handles, the complexity is high and thebenefit is not so high.

For record/array patterns, we use the classifier to select the type, andthen unroll any nested patterns into the switch body. For ANDedpatterns, we classify on the type of the first pattern and unroll therest into the switch body.

For legacy switches on enums and strings, we have the option ofreplacing the current desugaring with an indy-based classification as wedo with pattern switches, to move complex desugaring code from thecompiler to simpler runtime code. We can do this now, later, or never.



#### Open issues

We have a few open issues to sync on, though almost all can probably bekicked down the road past first preview:


 - Boundaries between legacy switches and pattern switches
   - allow mixing of constant and pattern cases?
   - allow switch on float/double/boolean with constant labels?
 - Allow statement switches to opt into totality?  How?

Pattern features for next iteration

Reply via email to