On Thu, 8 Apr 2021 18:51:21 GMT, Jorn Vernee <[email protected]> wrote:
> This patch adds a `tableSwitch` combinator that can be used to switch over a
> set of method handles given an index, with a fallback in case the index is
> out of bounds, much like the `tableswitch` bytecode. Here is a description of
> how it works (copied from the javadoc):
>
> Creates a table switch method handle, which can be used to switch over a
> set of target
> method handles, based on a given target index, called selector.
>
> For a selector value of {@code n}, where {@code n} falls in the range
> {@code [0, N)},
> and where {@code N} is the number of target method handles, the table
> switch method
> handle will invoke the n-th target method handle from the list of target
> method handles.
>
> For a selector value that does not fall in the range {@code [0, N)}, the
> table switch
> method handle will invoke the given fallback method handle.
>
> All method handles passed to this method must have the same type, with
> the additional
> requirement that the leading parameter be of type {@code int}. The
> leading parameter
> represents the selector.
>
> Any trailing parameters present in the type will appear on the returned
> table switch
> method handle as well. Any arguments assigned to these parameters will
> be forwarded,
> together with the selector value, to the selected method handle when
> invoking it.
>
> The combinator does not support specifying the starting index, so the switch
> cases always run from 0 to however many target handles are specified. A
> starting index can be added manually with another combination step that
> filters the input index by adding or subtracting a constant from it, which
> does not affect performance. One of the reasons for not supporting a starting
> index is that it allows for more lambda form sharing, but also simplifies the
> implementation somewhat. I guess an open question is if a convenience
> overload should be added for that case?
>
> Lookup switch can also be simulated by filtering the input through an
> injection function that translates it into a case index, which has also
> proven to have the ability to have comparable performance to, or even better
> performance than, a bytecode-native `lookupswitch` instruction. I plan to add
> such an injection function to the runtime libraries in the future as well.
> Maybe at that point it could be evaluated if it's worth it to add a lookup
> switch combinator as well, but I don't see an immediate need to include it in
> this patch.
>
> The current bytecode intrinsification generates a call for each switch case,
> which guarantees full inlining of the target method handles. Alternatively we
> could only have 1 callsite at the end of the switch, where each case just
> loads the target method handle, but currently this does not allow for
> inlining of the handles, since they are not constant.
>
> Maybe a future C2 optimization could look at the receiver input for
> invokeBasic call sites, and if the input is a phi node, clone the call for
> each constant input of the phi. I believe that would allow simplifying the
> bytecode without giving up on inlining.
>
> Some numbers from the added benchmarks:
>
> Benchmark (numCases) (offset)
> (sorted) Mode Cnt Score Error Units
> MethodHandlesTableSwitchConstant.testSwitch 5 0
> N/A avgt 30 4.186 � 0.054 ms/op
> MethodHandlesTableSwitchConstant.testSwitch 5 150
> N/A avgt 30 4.164 � 0.057 ms/op
> MethodHandlesTableSwitchConstant.testSwitch 10 0
> N/A avgt 30 4.124 � 0.023 ms/op
> MethodHandlesTableSwitchConstant.testSwitch 10 150
> N/A avgt 30 4.126 � 0.025 ms/op
> MethodHandlesTableSwitchConstant.testSwitch 25 0
> N/A avgt 30 4.137 � 0.042 ms/op
> MethodHandlesTableSwitchConstant.testSwitch 25 150
> N/A avgt 30 4.113 � 0.016 ms/op
> MethodHandlesTableSwitchConstant.testSwitch 50 0
> N/A avgt 30 4.118 � 0.028 ms/op
> MethodHandlesTableSwitchConstant.testSwitch 50 150
> N/A avgt 30 4.127 � 0.019 ms/op
> MethodHandlesTableSwitchConstant.testSwitch 100 0
> N/A avgt 30 4.116 � 0.013 ms/op
> MethodHandlesTableSwitchConstant.testSwitch 100 150
> N/A avgt 30 4.121 � 0.020 ms/op
> MethodHandlesTableSwitchOpaqueSingle.testSwitch 5 0
> N/A avgt 30 4.113 � 0.009 ms/op
> MethodHandlesTableSwitchOpaqueSingle.testSwitch 5 150
> N/A avgt 30 4.149 � 0.041 ms/op
> MethodHandlesTableSwitchOpaqueSingle.testSwitch 10 0
> N/A avgt 30 4.121 � 0.026 ms/op
> MethodHandlesTableSwitchOpaqueSingle.testSwitch 10 150
> N/A avgt 30 4.113 � 0.021 ms/op
> MethodHandlesTableSwitchOpaqueSingle.testSwitch 25 0
> N/A avgt 30 4.129 � 0.028 ms/op
> MethodHandlesTableSwitchOpaqueSingle.testSwitch 25 150
> N/A avgt 30 4.105 � 0.019 ms/op
> MethodHandlesTableSwitchOpaqueSingle.testSwitch 50 0
> N/A avgt 30 4.097 � 0.021 ms/op
> MethodHandlesTableSwitchOpaqueSingle.testSwitch 50 150
> N/A avgt 30 4.131 � 0.037 ms/op
> MethodHandlesTableSwitchOpaqueSingle.testSwitch 100 0
> N/A avgt 30 4.135 � 0.025 ms/op
> MethodHandlesTableSwitchOpaqueSingle.testSwitch 100 150
> N/A avgt 30 4.139 � 0.145 ms/op
> MethodHandlesTableSwitchRandom.testSwitch 5 0
> true avgt 30 4.894 � 0.028 ms/op
> MethodHandlesTableSwitchRandom.testSwitch 5 0
> false avgt 30 11.526 � 0.194 ms/op
> MethodHandlesTableSwitchRandom.testSwitch 5 150
> true avgt 30 4.882 � 0.025 ms/op
> MethodHandlesTableSwitchRandom.testSwitch 5 150
> false avgt 30 11.532 � 0.034 ms/op
> MethodHandlesTableSwitchRandom.testSwitch 10 0
> true avgt 30 5.065 � 0.076 ms/op
> MethodHandlesTableSwitchRandom.testSwitch 10 0
> false avgt 30 13.016 � 0.020 ms/op
> MethodHandlesTableSwitchRandom.testSwitch 10 150
> true avgt 30 5.103 � 0.051 ms/op
> MethodHandlesTableSwitchRandom.testSwitch 10 150
> false avgt 30 12.984 � 0.102 ms/op
> MethodHandlesTableSwitchRandom.testSwitch 25 0
> true avgt 30 8.441 � 0.165 ms/op
> MethodHandlesTableSwitchRandom.testSwitch 25 0
> false avgt 30 13.371 � 0.060 ms/op
> MethodHandlesTableSwitchRandom.testSwitch 25 150
> true avgt 30 8.628 � 0.032 ms/op
> MethodHandlesTableSwitchRandom.testSwitch 25 150
> false avgt 30 13.542 � 0.020 ms/op
> MethodHandlesTableSwitchRandom.testSwitch 50 0
> true avgt 30 4.701 � 0.015 ms/op
> MethodHandlesTableSwitchRandom.testSwitch 50 0
> false avgt 30 13.562 � 0.063 ms/op
> MethodHandlesTableSwitchRandom.testSwitch 50 150
> true avgt 30 7.991 � 3.111 ms/op
> MethodHandlesTableSwitchRandom.testSwitch 50 150
> false avgt 30 13.543 � 0.088 ms/op
> MethodHandlesTableSwitchRandom.testSwitch 100 0
> true avgt 30 4.712 � 0.020 ms/op
> MethodHandlesTableSwitchRandom.testSwitch 100 0
> false avgt 30 13.600 � 0.085 ms/op
> MethodHandlesTableSwitchRandom.testSwitch 100 150
> true avgt 30 4.676 � 0.011 ms/op
> MethodHandlesTableSwitchRandom.testSwitch 100 150
> false avgt 30 13.476 � 0.043 ms/op
>
>
> Testing:
> - [x] Running of included benchmarks
> - [x] Inspecting inlining trace and verifying method handle targets are
> inlined
> - [x] Running TestTableSwitch test (currently the only user of the new code)
> - [x] Running java/lang/invoke tests (just in case)
> - [x] Some manual testing
>
> Thanks,
> Jorn
> you have two strategy to de-sugar a switch,
>
> * if what you do after the case do not mutate any variables, you can desugar
> each case to a method more or less like a lambda (it's not exactly like a
> lambda because there is no capture) and you have an indy in front that will
> call the right method handles
>
> * you have a front end, with an indy that associate the object to an int and
> a backend which is tableswitch in the bytecode
>
> ...
>
> The tests above are using the first strategy
No, they are using the second strategy. The SwitchBootstraps patch I linked to
replaces the front end `lookupswitch` of a String switch with an
`invokedynamic` that computes an index for the back end jump table, which is
still a `tableswitch` in the bytecode.
As John also described, a hypothetical lookupSwitch combinator can be emulated
by using a `k -> [0, N)` projection that feeds into the tableSwitch combinator
that is proposed by this PR. The point of the examples I linked to was to show
several flavors of projection functions as an example of how this could be
implemented, and to show that they have competitive performance with a native
`lookupswitch` instruction (the 'legacy' case). i.e. the benchmarks show the
difference between `lookupswitch` implemented in bytecode, and a `k -> [0, N)`
projection function built by an `invokedynamic`. (sorry, I should have offered
more explanation in the first place)
The combinator added by _this_ PR is not meant to replace any part of the
String switch translation. For pattern switch the `tableSwitch` combinator
_could_ be used to implement the front end `k -> [0, N)` projection, but it is
not strictly required. Either way, that seems orthogonal to this PR.
-------------
PR: https://git.openjdk.java.net/jdk/pull/3401