Very comprehensive.  Four groups of comments below (one at the very bottom).

> On Apr 6, 2018, at 11:51 AM, Brian Goetz <brian.go...@oracle.com> wrote:
> 
> The following outlines our story for translating improved switches, including 
> both the switch improvements coming as part of JEP 325, and follow-on work to 
> add pattern matching to switches.  Much of this has been discussed already 
> over the last year, but here it is in one place.
> 
> # Switch Translation
> #### Maurizio Cimadamore and Brian Goetz 
> #### April 2018
> 
> ## Part 1 -- constant switches
> 
> This part examines the current translation of `switch` constructs by `javac`, 
> and proposes a more general translation for switching on primitives, boxes, 
> strings, and enums, with the goals of: 
> 
>  - Unify the treatment of `switch` variants, simplifying the compiler 
> implementation and reducing the static footprint of generated code; 
>  - Move responsibility for target classification from compile time to run 
> time, allowing us to more freely update the logic without updating the 
> compiler.
> 
> ## Current translation 
> 
> Switches on `int` (and the smaller integer primitives) are translated in one 
> of two ways.  If the labels are relatively dense, we translate an `int` 
> switch to a `tableswitch`; if they are sparse, we translate to a 
> `lookupswitch`.  The current heuristic appears to be that we use a 
> `tableswitch` if it results       in a smaller bytecode than a `lookupswitch` 
> (which uses twice as many bytes per entry), which is a reasonable heuristic. 
> 
> #### Switches on boxes 
> 
> Switches on primitive boxes are currently implemented as if they were 
> primitive switches, unconditionally unboxing the target before entry 
> (possibly throwing NPE). 
> 
> #### Switches on strings 
> 
> Switches on strings are implemented as a two-step process, exploiting the 
> fact that strings cache their `hashCode()` and that hash codes are reasonably 
> spread out. Given a switch on strings like the one below: 
> 
>     switch (s) { 
>         case "Hello": ... 
>         case "World": ... 
>         default: ... 
>     } 
> 
> The compiler desugar this into two separate switches, where the first switch 
> maps the input strings into a range of numbers [0..1], as shown below, which 
> can then be used in a subsequent plain switch on ints.  The generated code 
> unconditionally calls `hashCode()`, again possibly throwing NPE. 
> 
>     int index=-1; 
>     switch (s.hashCode()) { 
>         case 12345: if (!s.equals("Hello")) break; index = 1; break; 
>         case 6789: if (!s.equals("World")) break; index = 0; break; 
>         default: index = -1; 
>     } 
>     switch (index) { 
>         case 0: ... 
>         case 1: ... 
>         default: ... 
>     } 
> 
> If there are hash collisions between the strings, the first switch must try 
> all possible matching strings. 

Minor point: unclear why the default case has to assign -1 to index, when it it 
already initialized to -1.

I see why you use this structure, because it fits a general paradigm of first 
mapping to an integer.

However, a post-optimization might be able to turn such a structure, where the 
assignments to “index” are all constants rather than the results of calling 
some opaque classifier method, into a control structure with a single switch 
statement and no use of the intermediate integer encoding.  I’ll show a more 
general example to give you the idea:

    switch (s) { 
        case "Hello": stmts1;
        case "World": stmts2;
        case “Goodbye": stmts3;
        default: stmtsD;
    } 

and suppose that “Hello” and “Goodbye” happen to have the same hashcode.  It 
might be transformed into:

    int index=-1; 
    switch (s.hashCode()) { 
        case 12345: if (s.equals("Hello")) index = 0; else if 
(s.equals(“Goodbye")) index = 2; break; 
        case 6789: if (s.equals("World")) index = 1; break; 
        default: break; 
    } 
    switch (index) { 
        case 0: stmts1;
        case 1: stmts2;
        case 2: stmts3;
        default: stmtsD;
    } 

I now suggest that a post-optimization might then turn this into:

  SUCCESS: {
    DEFAULT: {
      switch (s.hashCode()) { 
        case 12345: if (s.equals("Hello")) { stmts1; break SUCCESS; } else if 
(s.equals(“Goodbye")) { stmts3; break SUCCESS; } else break DEFAULT;
        case 6789: if (s.equals("World")) { stmts2; break SUCCESS; } else break 
DEFAULT;
        default: break DEFAULT; 
      }
      break SUCCESS;
    }
    do { stmtsD; } while(0);
  } 

where `SUCCESS` and `DEFAULT` are suitably generated fresh statement labels.  
(You might think that simply

  SUCCESS: {
      switch (s.hashCode()) { 
        case 12345: if (s.equals("Hello")) { stmts1; break SUCCESS; } else if 
(s.equals(“Goodbye")) { stmts3; break SUCCESS; } else break;
        case 6789: if (s.equals("World")) { stmts2; break SUCCESS; } else break;
        default: break; 
      }
    stmtsD;
  } 

would do the job, but the first version, using a `DEFAULT` label and a dummy 
`do` statement, allows for stmts1, stmts2, stmts3, and stmtsD to contain 
`break` statements.  Of course, that’s assuming use of only surface syntax to 
express the transformations; the compiler can probably be smarter than that in 
practice.)

> #### Switches on enums 
> 
> Switches on `enum` constants exploit the fact that enums have (usually dense) 
> integral ordinal values.  Unfortunately, because an ordinal value can change 
> between compilation time and runtime, we cannot rely on this mapping 
> directly, but instead need to do an extra layer of mapping.  Given a switch 
> like: 
> 
>     switch(color) { 
>         case RED: ... 
>         case GREEN: ... 
>     } 
> 
> The compiler numbers the cases starting a 1 (as with string switch), and 
> creates a synthetic class that maps the runtime values of the enum ordinals 
> to the statically numbered cases:

Inconsistency: in the string example above, you actually numbered the cases 0 
and 1, not 1 and 2.

> 
>     class Outer$0 { 
>         synthetic final int[] $EnumMap$Color = new 
> int[Color.values().length]; 
>         static { 
>             try { $EnumMap$Color[RED.ordinal()] = 1; } catch 
> (NoSuchFieldError ex) {} 
>             try { $EnumMap$Color[GREEN.ordinal()] = 2; } catch 
> (NoSuchFieldError ex) {} 
>         } √
>     } 
> 
> Then, the switch is translated as follows: 
> 
>     switch(Outer$0.$EnumMap$Color[color.ordinal()]) { 
>         case 1: stmt1; 
>         case 2: stmt2 
>     } 

Presumably for this example the chosen integers start with 1 rather than 0, so 
that if any element of the array is not explicitly initialized by Outer$0, its 
default 0 value will not be confused with an actual enum value.  This subtle 
point should be mentioned explicitly.

An interesting question is whether a “case 0: throw new Exception(…);” should 
be supplied, on the grounds that it’s okay for the programmer to ignore enum 
values presumably known at compile time, but not to ignore values that sneaked 
in later?  (If the desire is really to ignore all such values, known and 
unknown, the programmer can always write “default: break;”.)

> In other words, we construct an array whose size is the cardinality of the 
> enum, and then the element at position *i* of such array will contain the 
> case index corresponding to the enum constant with whose ordinal is *i*. 
> 
> ## A more general scheme 
> 
> The handling of strings and enums give us a hint of how to create a more 
> regular scheme; for `switch` targets more complex than `int`, we lower the 
> `switch` to an `int` switch with consecutive `case` labels, and use a 
> separate process to map the target into the range of synthetic case labels. 
> 
> Now that we have `invokedynamic` in our toolbox, we can reduce all of the 
> non-`int` cases to a single form, where we number the cases with consecutive 
> integers, and perform case selection via an `invokedynamic`-based classifier 
> function, whose static argument list receives a description of the actual 
> targets, and which returns an `int` identifying what `case` to select. 
> 
> This approach has several advantages: 
>  - Reduced compiler complexity -- all switches follow a common pattern; 
>  - Reduced static code size; 
>  - The classification function can select from a wide range of strategies 
> (linear search, binary search, building a `HashMap`, constructing a perfect 
> hash function, etc), which can vary over time or from situation to situation; 
>  - We are free to improve the strategy or select an alternate strategy (say, 
> to optimize for startup time) without having to recompile the code; 
>  - Hopefully at least, if not more, JIT-friendly than the existing 
> translation. 
> 
> We can also use this approach in preference to `lookupswitch` for non-dense 
> `int` switches, as well as use it to extend `switch` to handle `long`, 
> `float`, and `double` targets (which were surely excluded in part because the 
> JVM didn't provide a convenient translation target for these types.) 
> 
> #### Bootstrap design 
> 
> When designing the `invokedynamic` bootstraps to support this translation, we 
> face the classic lumping-vs-splitting decision. For now, we'll bias towards 
> splitting.  In the following example, `BOOTSTRAP_PREAMBLE` indicates the 
> usual leading arguments for an indy bootstrap.  We assume the compiler has 
> numbered the case values densely from 0..N, and the bootstrap will return 
> [0,n) for success, or N for "no match". 
> 
> A strawman design might be: 
> 
>     // Numeric switches for P, accepts invocation as P -> I or Box(P) -> I 
>     CallSite intSwitch(BOOTSTRAP_PREAMBLE, int... caseValues) 
> 
>     // Switch for String, invocation descriptor is String -> I 
>     CallSite stringSwitch(BOOTSTRAP_PREAMBLE, String... caseValues) 
> 
>     // Switch for Enum, invocation descriptor is E -> I 
>     CallSite enumSwitch(BOOTSTRAP_PREAMBLE, Class<Enum<E extends Enum<E>>> 
> clazz, 
>                         String... caseNames) 
> 
> It might be possible to encode all of these into a single bootstrap, but 
> given that the compiler already treats each type slightly differently, it 
> seems there is little value in this sort of lumping for non-pattern switches. 
> 
> The `enumSwitch` bootstrap as proposed uses `String` values to describe the 
> enum constants, rather than encoding the enum constants directly via condy.  
> This allows us to be more robust to enums disappearing after compilation. 
> 
> This strategy is also dependent on having broken the limitation on 253 
> bootstrap arguments in indy/condy. 
> 
> #### Extending to other primitive types 
> 
> This approach extends naturally to other primitive types (long, double, 
> float), by the addition of some more bootstraps (which need to deal with the 
> additional complexities of infinity, NaN, etc): 
> 
>     CallSite longSwitch(BOOTSTRAP_PREAMBLE, long... caseValues) 
>     CallSite floatSwitch(BOOTSTRAP_PREAMBLE, float... caseValues) 
>     CallSite doubleSwitch(BOOTSTRAP_PREAMBLE, double... caseValues) 
> 
> #### Extending to null 
> 
> The scheme as proposed above does not explicitly handle nulls, which is a 
> feature we'd like to have in `switch`.  There are a few ways we could add 
> null handling into the API: 
> 
>  - Split entry points into null-friendly or null-hostile switches; 
>  - Find a way to encode nulls in the array of case values (which can be done 
> with condy); 
>  - Always treat null as a possible input and a distinguished output, and have 
> the compiler ensure the switch can handle this distinguished output. 
> 
> The last strategy is appealing and straightforward; assign a sentinel value 
> (-1) to `null`, and always return this sentinel when the input is null.  The 
> compiler ensures that some case handles `null`, and if no case handles `null` 
> then it inserts an implicit 
> 
>     case -1: throw new NullPointerException(); 
> 
> into the generated code. 
> 
> #### General example 
> 
> If we have a string switch: 
> 
>     switch (x) { 
>         case "Foo": m(); break; 
>         case "Bar": n(); // fall through 
>         case "Baz": r(); break; 
>         default: p(); 
>     } 
> 
> we translate into: 
> 
>     int t = indy[bsm=stringSwitch["Foo", "Bar", "Baz"]](x) 
>     switch (t) { 
>         case -1: throw new NullPointerException();  // implicit null case 
>         case 0: m(); break; 
>         case 1: n(); // fall through 
>         case 2: r(); break; 
>         case 3: p();                                // default case 
>     } 
> 
> All switches, with the exception of `int` switches (and maybe not even 
> non-dense `int` switches), follow this exact pattern.  If the target type is 
> not a reference type, the `null` case is not needed. 
> 
> This strategy is implemented in the `switch` branch of the amber repository; 
> see `java.lang.runtime.SwitchBootstraps` in that branch for (rough!) 
> implementations of the bootstraps.
> 
> ## Patterns in narrow-target switches 
> 
> When we add patterns, we may encounter switches whose targets are tightly 
> typed (e.g., `String` or `int`) but still use some patterns in their 
> expression.  For switches whose target type is a primitive, primitive box, 
> `String`, or `enum`, we'd like to use the optimized translation strategy 
> outlined here, but the following kinds of patterns might still show up in a 
> switch on, say, `Integer`: 
> 
>     case var x: 
>     case _: 
>     case Integer x: 
>     case Integer(var x): 
> 
> The first three can be translated away by the source compiler, as they are 
> semantically equivalent to `default`.  If any nontrivial patterns are present 
> (including deconstruction patterns), we may need to translate as a pattern 
> switch scheme -- see Part 2.  (While the language may not distinguish between 
> "legacy" and "pattern" switches -- in that all switches are pattern switches 
> -- we'd like to avoid giving up obvious optimizations if we can.) 
> 
> # Part 2 -- type test patterns and guards 
> 
> A key motivation for reexamining switch translation is the impending arrival 
> of patterns in switch.  We expect switch translation for the pattern case to 
> follow a similar structure -- lower to an `int` switch and use an indy-based 
> classifier to select an index.  However, there are a few additional 
> complexities.  One is that pattern cases may have guards, which means we need 
> to be able to re-enter the bootstrap with an indication to "continue matching 
> from case N", in the event of a failed guard. (Even if the language doesn't 
> support guards directly, the obvious implementation strategy for nested 
> patterns is to desugar them into guards.)
> 
> Translating pattern switches is more complicated because there are more 
> options for how to divide the work between the statically generated code and 
> the switch classifier, and different choices have different performance 
> side-effects (are binding variables "boxed" into a tuple to be returned, or 
> do they need to be redundantly calculated).  
> 
> ## Type-test patterns 
> 
> Type-test patterns are notable because their applicability predicate is 
> purely based on the type system, meaning that the compiler can directly 
> reason about it both statically (using flow analysis, optimizing away dynamic 
> type tests) and dynamically (with `instanceof`.)  A switch involving 
> type-tests: 
> 
>     switch (x) { 
>         case String s: ... 
>         case Integer i: ... 
>         case Long l: ... 
>     } 
> 
> can (among other strategies) be translated into a chain of `if-else` using 
> `instanceof` and casts: 
> 
>     if (x instanceof String) { String s = (String) x; ... } 
>     else if (x instanceof Integer) { Integer i = (Integer) x; ... } 
>     else if (x instanceof Long) { Long l = (Long) x; ... } 
> 
> #### Guards 
> 
> The `if-else` desugaring can also naturally handle guards: 
> 
>     switch (x) { 
>         case String s 
>             where (s.length() > 0): ... 
>         case Integer i 
>             where (i > 0): ... 
>         case Long l 
>             where (l > 0L): ... 
>     } 
> 
> can be translated to: 
> 
>     if (x instanceof String 
>         && ((String) x).length() > 0) { String s = (String) x; ... } 
>     else if (x instanceof Integer 
>              && ((Integer) x) > 0) { Integer i = (Integer) x; ... } 
>     else if (x instanceof Long 
>              && ((Long) x) > 0L) { Long l = (Long) x; ... } 
> 
> #### Performance concerns 
> 
> The translation to `if-else` chains is simple (for switches without 
> fallthrough), but is harder for the VM to optimize, because we've used a more 
> general control flow mechanism.  If the target is an empty `String`, which 
> means we'd pass the first `instanceof` but fail the guard, class-hierarchy 
> analysis could tell us that it can't possibly be an `Integer` or a `Long`, 
> and so there's no need to perform those tests. But generating code that takes 
> advantage of this information is more complex. 
> 
> In the extreme case, where a switch consists entirely of type test patterns 
> for final classes, this could be performed as an O(1) operation by hashing.  
> And this is a common case involving switches over alternatives in a sum 
> (sealed) type.  (We shouldn't rely on finality at compile time, as this can   
>         change between compile and run time, but we should take advantage of 
> this at run time if we can.) 
> 
> Finally, the straightforward static translation may miss opportunities for 
> optimization.  For example: 
> 
>     switch (x) { 
>         case Point p 
>             where p.x > 0 && p.y > 0: A 
>         case Point p 
>             where p.x > 0 && p.y == 0: B 
>     } 
> 
> Here, not only would we potentially test the target twice to see if it is a 
> `Point`, but we then further extract the `x` component twice and perform the 
> `p.x > 0` test twice.  
> 
> #### Optimization opportunities 
> 
> The compiler can eliminate some redundant calculations through 
> straightforward techniques.  The previous switch can be transformed to: 
> 
>     switch (x) { 
>         case Point p: 
>             if (((Point) p).x > 0 && ((Point) p).y > 0) { A } 
>             else if (((Point) p).x > 0 && ((Point) p).y > 0) { B } 
> 
> to eliminate the redundant `instanceof` (and admits further CSE 
> optimizations.)
> 
> #### Clause reordering 
> 
> The above example was easy to transform because the two `case Point` clauses 
> were adjacent.  But what if they are not?  In some cases, it is safe to 
> reorder them.  For types `T` and `U`, it is safe to reorder `case T` and 
> `case U` if the two types have no intersection; that there can be no types 
> that are subtypes of them both.  This is true when `T` and `U` are classes 
> and neither extends the other, or when one is a final class and the other is 
> an interface that the class does not implement.  
> 
> The compiler could then reorder case clauses so that all the ones whose first 
> test is `case Point` are adjacent, and then coalesce them all into a single 
> arm of the `if-else` chain. 
> 
> A possible spoiler here is fallthrough; if case A falls into case B, then 
> cases A and B have to be moved as a group.  (This is another reason to 
> consider limiting fallthrough.) 
> 
> A bigger possible spoiler here is separate compilation.  If at compile time, 
> we see that `T` and `U` are disjoint types, do we want to bake that 
> assumption into the compilation, or do we have to re-check that assumption at 
> runtime?  
> 
> #### Summary of if-else translation 
> 
> While the if-else translation at first looks pretty bad, we are able to 
> extract a fair amount of redundancy through well-understood compiler 
> transformations.  If an N-way switch has only M distinct types in it, in most 
> cases we can reduce the cost from _O(N)_ to _O(M)_.  Sometimes _M == N_, so 
> this doesn't help, but sometimes _M << N_ (and sometimes `N` is small, in 
> which case _O(N)_ is fine.) 
> 
> Reordering clauses involves some risk; specifically, that the class hierarchy 
> will change between compile and run time.  It seems eminently safe to reorder 
> `String` and `Integer`, but more questionable to reorder an arbitrary class 
> `Foo` with `Runnable`, even if `Foo` doesn't implement `Runnable` now, 
> because it might easily be changed to do so later.  Ideally we'd like to 
> perform class-hierarchy optimizations using the runtime hierarchy, not the 
> compile-time hierarchy. 
> 
> ## Type classifiers 
> 
> The technique outlined in _Part 1_, where we lower the complex switch to a 
> dense `int` switch, and use an indy-based classifier to select an index, is 
> applicable here as well.  First let's consider a switch consisting only of 
> unguarded type-test patterns, optionally with a default clause.
> 
> We'll start with an `indy` bootstrap whose static argument are `Class` 
> constants corresponding to each arm of the switch, whose dynamic argument is 
> the switch target, and whose return value is a case number (or distinguished 
> sentinels for "no match" and `null`.)  We can easily implement such a 
> bootstrap with a linear search, but can also do better; if some subset of the 
> classes are `final`, we can choose between these more quickly (such as via 
> binary search on `hashCode()`, hash function, or hash table), and we need 
> perform only a single operation to test all of those at once. Dynamic 
> techniques (such as a building a hash map of previously seen target types), 
> which `indy` is well-suited to, can asymptotically approach _O(1)_ even when 
> the classes involved are not final. 
> 
> So we can lower: 
> 
>     switch (x) { 
>         case T t: A 
>         case U u: B 
>         case V v: C 
>     } 
> 
> to 
> 
>     int y = indy[bootstrap=typeSwitch(T.class, U.class, V.class)](x) 
>     switch (y) { 
>         case 0: A 
>         case 1: B 
>         case 2: C 
>     } 
> 
> This has the advantages that the generated code is very similar to the source 
> code, we can (in some cases) get _O(1)_ dispatch performance, and we can 
> handle fallthrough with no additional complexity. 
> 
> #### Guards 
> 
> There are two approaches we could take to add support for guards into the 
> process; we could try to teach the bootstrap about guards (and would have to 
> pass locals that appear in guard expressions as additional arguments to the 
> classifier), or we could leave guards to the generated bytecode.  The latter 
> seems far more attractive, but requires some tweaks to the bootstrap 
> arguments and to the shape of the generated code. 
> 
> If the classifier says "you have matched case #3", but then we fail the guard 
> for #3, we want to go back into the classifier and start again at #4.  
> (Sometimes the classifier can also use this information ("start over at #4") 
> to optimize away unnecessary tests.)
> 
> We add a second argument (where to start) to the classifier invocation 
> signature, and wrap the switch in a loop, lowering: 
> 
>     switch (target) { 
>         case T t where (e1): A 
>         case T t where (e2): B 
>         case U u where (e3): C 
>     } 
> 
> into 
> 
>     int index = -1; // start at the top 
>     while (true) { 
>         index = indy[...](target, index) 
>         switch (index) { 
>             case 0: if (!e1) continue; A 
>             case 1: if (!e2) continue; B 
>             case 2: if (!e3) continue; C 
>             default: break;
>         } 
>         break; 
>     } 
> 
> For cases where the same type test is repeated in consecutive positions (at N 
> and N+1), we can have the static compiler coalesce them as above, or we could 
> have the bootstrap maintain a table so that if you re-enter the bootstrap 
> where the previous answer was N, then it can immediately return           
> N+1.  Similarly, if N and N+1 are known to be mutually exclusive types (like 
> `String` and `Integer`), on reentering the classifier with N, we can skip 
> right to N+2 since if we matched `String`, we cannot match `Integer`. Lookup 
> tables for such optimizations can be built at callsite linkage time. 
> 
> #### Mixing constants and type tests 
> 
> This approach also extends to tests that are a mix of constant patterns and 
> type-test patterns, such as: 
> 
>     switch (x) { 
>         case "Foo": ... 
>         case 0L: ... 
>         case Integer i: 
>     } 
> 
> We can extend the bootstrap protocol to accept constants as well as types, 
> and it is a straightforward optimization to combine both type matching and 
> constant matching in a single pass. 
> 
> ## Nested patterns
> 
> Nested patterns are essentially guards; even if we don't expose guards in the 
> language, we can desugar
> 
>     case Point(0, var x):
> 
> into the equivalent of
> 
>     case Point(var a, var x) && a matches 0: 
> 
> using the same translation story as above -- use the classifier to select a 
> candidate case arm based on the top-type of the pattern, and then do 
> additional checks in the generated bytecode, and if the checks fail, continue 
> and re-enter the classifier starting at the next case.  
> 
> #### Explicit continue
> 
> An alternative to exposing guards is to expose an explicit `continue` 
> statement in switch, which would have the effect of "keep matching at the 
> next case."  Then guards could be expressed imperatively as:
> 
>     case P: 
>         if (!guard)
>             continue;
>         ...
>         break;
>     case Q: …
> 
A nice idea, but careful: it is already meaningful to write:

        while (…) { switch (…) { case 1: … case 2: if (foo) continue; … } }

and expect the `continue` to start a new iteration of the `while` loop.  
Indeed, this fact was already exploited above under “### Guards”.

If you really want to introduce the idea of “continuing a switch dispatch" into 
the surface syntax, even if only for expository purposes, let me suggest the 
form `continue switch;`.

switch (…) {
  case P: 
        if (!guard)
            continue switch;
        ...
        break;
    case Q: …
}

—Guy

Reply via email to