Re: Switch translation, part 2

Remi Forax Tue, 02 Jan 2018 04:36:13 -0800

Hi all, 
while the proposed translation is a good translation by default, when you can 
have fallthroughs or guards, if you have none of them, it's not the best 
translation.


[CC John Rose because i may say something stupid] 

The problem is that the VM doesn't not prune never called cases in a switch 
while it does that for the branch of a if, so an if ... else can be faster than 
a switch in the case only some cases are exercise at runtime. 
Also note that a lot of tableswitch end up to be generated by the VM as if .. 
else if you take a look to the generated assembly code because jumping to a 
computed address is not free. 
So it seems a good idea in the case of an expression switch with no guard to 
not generate an indy + a tableswitch but just an indy. 

So instead of lowering: 

switch (x) { 
case T t: A 
case U u: B 
case V v: C 
} 

to 

int y = indy[bootstrap=typeSwitch(T.class, U.class, V.class)](x) 
switch (y) { 
case 0: A 
case 1: B 
case 2: C 
} 

I propose to lowering it to something similar to the lambda translation: 
var expr = indy[bootstrap=exprTypeSwich(1, T.class, U.class, V.class)(x); 
and let the bootstrap to do all the lifting. 
The first bootstrap argument is the number of the switch in the code, here 1. 

With A, B and C being desugared as static methods respectively to 
switch$1case$0, switch$1case$1 and switch$1case$2 (i.e. "switch$" + 
switchNumber + "case$" + caseNumber. 

The bootstrap method exprTypeSwich can work like an inlining cache if the 
number of branches actually visited is small, this is interesting because the 
performance of an expression switch will be in the same ball park as the 
performance of the corresponding virtual call, and if there is too many 
branches used, revert to use a new method handle combinator that does a 
tableswitch*. 

cheers, 
Rémi 

* the JSR 292 has talked several times to introduce such method handle 
combinator. 

> De: "Brian Goetz" <brian.go...@oracle.com>
> À: "amber-spec-experts" <amber-spec-experts@openjdk.java.net>
> Envoyé: Lundi 11 Décembre 2017 22:37:57
> Objet: Switch translation, part 2

> # Switch Translation, Part 2 -- type test patterns and guards
> #### Maurizio Cimadamore and Brian Goetz
> #### December 2017

> This document examines possible translation of `switch` constructs involving
> `case` labels that include type-test patterns, potentially with guards. Part 3
> will address translation of destructuring patterns, nested patterns, and OR
> patterns.

> ## Type-test patterns

> Type-test patterns are notable because their applicability predicate is purely
> based on the type system, meaning that the compiler can directly reason about
> it both statically (using flow analysis, optimizing away dynamic type tests)
> and dynamically (with `instanceof`.) A switch involving type-tests:

> switch (x) {
> case String s: ...
> case Integer i: ...
> case Long l: ...
> }

> can (among other strategies) be translated into a chain of `if-else` using
> `instanceof` and casts:

> if (x instanceof String) { String s = (String) x; ... }
> else if (x instanceof Integer) { Integer i = (Integer) x; ... }
> else if (x instanceof Long) { Long l = (Long) x; ... }

> #### Guards

> The `if-else` desugaring can also naturally handle guards:

> switch (x) {
> case String s
> where (s.length() > 0): ...
> case Integer i
> where (i > 0): ...
> case Long l
> where (l > 0L): ...
> }

> can be translated to:

> if (x instanceof String
> && ((String) x).length() > 0) { String s = (String) x; ... }
> else if (x instanceof Integer
> && ((Integer) x) > 0) { Integer i = (Integer) x; ... }
> else if (x instanceof Long
> && ((Long) x) > 0L) { Long l = (Long) x; ... }

> #### Performance concerns

> The translation to `if-else` chains is simple (for switches without
> fallthrough), but is harder for the VM to optimize, because we've used a more
> general control flow mechanism. If the target is an empty `String`, which 
> means
> we'd pass the first `instanceof` but fail the guard, class-hierarchy analysis
> could tell us that it can't possibly be an `Integer` or a `Long`, and so
> there's no need to perform those tests. But generating code that takes
> advantage of this information is more complex.

> In the extreme case, where a switch consists entirely of type test patterns 
> for
> final classes, this could be performed as an O(1) operation by hashing. And
> this is a common case involving switches over alternatives in a sum (sealed)
> type. (We probably shouldn't rely on finality at compile time, as this can
> change between compile and run time, but we would like to take advantage of
> this at run time if we can.)

> Finally, the straightforward static translation may miss opportunities for
> optimization. For example:

> switch (x) {
> case Point p
> where p.x > 0 && p.y > 0: A
> case Point p
> where p.x > 0 && p.y == 0: B
> }

> Here, not only would we potentially test the target twice to see if it is a
> `Point`, but we then further extract the `x` component twice and perform the
> `p.x > 0` test twice.

> #### Optimization opportunities

> The compiler can eliminate some redundant calculations through straightforward
> techniques. The previous switch can be transformed to:

> switch (x) {
> case Point p:
> if (((Point) p).x > 0 && ((Point) p).y > 0) { A }
> else if (((Point) p).x > 0 && ((Point) p).y > 0) { B }

> to eliminate the redundant `instanceof` (and could be further transformed to
> eliminate the downstream redundant computations.)

> #### Clause reordering

> The above example was easy to transform because the two `case Point` clauses
> were adjacent. But what if they are not? In some cases, it is safe to reorder
> them. For types `T` and `U`, it is safe to reorder `case T` and `case U` if 
> the
> two types have no intersection; that there can be no types that are subtypes 
> of
> them both. This is true when `T` and `U` are classes and neither extends the
> other, or when one is a final class and the other is an interface that the
> class does not implement.

> The compiler could then reorder case clauses so that all the ones whose first
> test is `case Point` are adjacent, and then coalesce them all into a single 
> arm
> of the `if-else` chain.

> A possible spoiler here is fallthrough; if case A falls into case B, then 
> cases
> A and B have to be moved as a group. (This is another reason to consider
> limiting fallthrough.)

> #### Summary of if-else translation

> While the if-else translation at first looks pretty bad, we are able to 
> extract
> a fair amount of redundancy through well-understood compiler transformations.
> If an N-way switch has only M distinct types in it, in most cases we can 
> reduce
> the cost from _O(N)_ to _O(M)_. Sometimes _M == N_, so this doesn't help, but
> sometimes _ M << N _ (and sometimes `N` is small, in which case _O(N)_ is
> fine.)

> Reordering clauses involves some risk; specifically, that the class hierarchy
> will change between compile and run time. It seems eminently safe to reorder
> `String` and `Integer`, but more questionable to reorder an arbitrary class
> `Foo` with `Runnable`, even if `Foo` doesn't implement `Runnable` now, because
> it might easily be changed to do so later. Ideally we'd like to perform
> class-hierarchy optimizations using the runtime hierarchy, not the 
> compile-time
> hierarchy.

> ## Type classifiers

> The technique outlined in _Part 1_, where we lower the complex switch to a 
> dense
> `int` switch, and use an indy-based classifier to select an index, is
> applicable here as well. First let's consider a switch consisting only of
> unguarded type-test patterns (and optionally a default clause.)

> We'll start with an `indy` bootstrap whose static argument are `Class` 
> constants
> corresponding to each arm of the switch, whose dynamic argument is the switch
> target, and whose return value is a case number (or distinguished sentinels 
> for
> "no match" and `null`.) We can easily implement such a bootstrap with a linear
> search, but can also do better; if some subset of the classes are `final`, we
> can choose between these more quickly (such as via binary search on
> `hashCode()`, hash function, or hash table), and we need perform only a single
> operation to test all of those at once. Dynamic techniques (such as a building
> a hash map of previously seen target types), which `indy` is well-suited to,
> can asymptotically approach _O(1)_ even when the classes involved are not
> final.

> So we can lower:

> switch (x) {
> case T t: A
> case U u: B
> case V v: C
> }

> to

> int y = indy[bootstrap=typeSwitch(T.class, U.class, V.class)](x)
> switch (y) {
> case 0: A
> case 1: B
> case 2: C
> }

> This has the advantages that the generated code is very similar to the source
> code, we can (in some cases) get _O(1)_ dispatch performance, and we can 
> handle
> fallthrough with no additional complexity.

> #### Guards

> There are two approaches we could take to add support for guards into the
> process; we could try to teach the bootstrap about guards (and would have to
> pass locals that appear in guard expressions as additional arguments to the
> classifier), or we could leave guards to the generated bytecode. The latter
> seems far more attractive, but requires some tweaks to the bootstrap arguments
> and to the shape of the generated code.

> If the classifier says "you have matched case #3", but then we fail the guard
> for #3, we want to go back into the classifier and start again at #4.
> Additionally, we'd like for the classifier to use this information ("start 
> over
> at #4") to optimize away unnecessary tests.

> We add a second argument (where to start) to the classifier invocation
> signature, and wrap the switch in a loop, lowering:

> switch (x) {
> case T t where (e1): A
> case T t where (e2): B
> case U u where (e3): C
> }

> into

> int y = -1; // start at the top
> while (true) {
> y = indy[...](x, y)
> switch (y) {
> case 0: if (!e1) continue; A
> case 1: if (!e2) continue; B
> case 2: if (!e3) continue; C
> }
> break;
> }

> For cases where the same type test is repeated in consecutive positions (at N
> and N+1), we can have the static compiler coalesce them as above, or we could
> have the bootstrap maintain a table so that if you re-enter the bootstrap 
> where
> the previous answer was N, then it can immediately return N+1. Similarly, if N
> and N+1 are known to be mutually exclusive types (like `String` and 
> `Integer`),
> on reentering the classifier with N, we can skip right to N+2 since if we
> matched `String`, we cannot match `Integer`. Lookup tables for such
> optimizations can be built at link time.

> #### Mixing constants and type tests

> This approach also extends to tests that are a mix of constant patterns and
> type-test patterns, such as:

> switch (x) {
> case "Foo": ...
> case 0L: ...
> case Integer i:
> }

> We can extend the bootstrap protocol to accept constants as well as types, and
> it is a straightforward optimization to combine both type matching and 
> constant
> matching in a single pass.

Re: Switch translation, part 2

Reply via email to