On 09/04/2021 18:54, Remi Forax wrote:
----- Mail original -----
De: "Jorn Vernee" <jver...@openjdk.java.net>
À: "core-libs-dev" <core-libs-dev@openjdk.java.net>
Envoyé: Vendredi 9 Avril 2021 12:51:53
Objet: RFR: 8263087: Add a MethodHandle combinator that switches over a set of 
MethodHandles
This patch adds a `tableSwitch` combinator that can be used to switch over a set
of method handles given an index, with a fallback in case the index is out of
bounds, much like the `tableswitch` bytecode.

The combinator does not support specifying the starting index, so the switch
cases always run from 0 to however many target handles are specified. A
starting index can be added manually with another combination step that filters
the input index by adding or subtracting a constant from it, which does not
affect performance. One of the reasons for not supporting a starting index is
that it allows for more lambda form sharing, but also simplifies the
implementation somewhat. I guess an open question is if a convenience overload
should be added for that case?
I think the combinator should be lookupswitch which is more general than 
tableswitch with a special case when generating the bytecode to generate a 
tableswitch instead of a lookupswitch if the indexes are subsequent.
One of the bigger downsides I see in supporting lookupswitch directly is that the lambda form and intrinsified bytecode become dependent on the key set, which allows for less sharing. Something that is not/less of a problem with tableswitch + filter function, because the filter function could potentially be the same for any key set (where the key set is bound to the filter function instead).

Lookup switch can also be simulated by filtering the input through an injection
function that translates it into a case index, which has also proven to have
the ability to have comparable performance to, or even better performance than,
a bytecode-native `lookupswitch` instruction. I plan to add such an injection
function to the runtime libraries in the future as well. Maybe at that point it
could be evaluated if it's worth it to add a lookup switch combinator as well,
but I don't see an immediate need to include it in this patch.

As i said in the bug when we discuss about that the filtering function,
i believe that the filtering function for emulating lookupswitch is 
lookupswitch itself.
Right, but lookupswitch also ties us into C2's optimization strategy for lookupswitch. Having the ability to specify the filter function allows picking a better one for the particular use-case. For instance for switches with a large-ish number of cases (15+) it's faster to use a HashMap lookup as a filtering function (according to my benchmarking), with comparinble results to native lookupswitch if the filter function uses a tree of if/else.

Though, I'm not saying that it's not worth it to add a lookupswitch combinator as well, to me it seems like tableswitch is the more flexible/minimal primitive, because it doesn't force the use of a particular lookup strategy.

WRT picking the translation strategy based on the set of keys; I'm note super keen on that. Since the MethodHandle combinators are a low-level API, I ended up adopting a simple 'what you see is what you get' philosophy as much as possible, with the possibility of building other use-cases on top. i.e. a tableSwitch combinator that reliably translates into the tableswitch bytecode, a lookupSwitch combinator that reliably translates into the lookupswitch bytecode, and an exception if I get the key set wrong, rather than silently switching strategies to one or the other.

The current bytecode intrinsification generates a call for each switch case,
which guarantees full inlining of the target method handles. Alternatively we
could only have 1 callsite at the end of the switch, where each case just loads
the target method handle, but currently this does not allow for inlining of the
handles, since they are not constant.
This scheme also allows to never JIT compile a branch which is never used.

Yes, that's a good point, thanks.

Thanks for the input,
Jorn


Maybe a future C2 optimization could look at the receiver input for invokeBasic
call sites, and if the input is a phi node, clone the call for each constant
input of the phi. I believe that would allow simplifying the bytecode without
giving up on inlining.

Some numbers from the added benchmarks:
Benchmark                                        (numCases)  (offset)  (sorted)
Mode  Cnt   Score   Error  Units
MethodHandlesTableSwitchConstant.testSwitch               5         0       N/A
avgt   30   4.186 � 0.054  ms/op
MethodHandlesTableSwitchConstant.testSwitch               5       150       N/A
avgt   30   4.164 � 0.057  ms/op
MethodHandlesTableSwitchConstant.testSwitch              10         0       N/A
avgt   30   4.124 � 0.023  ms/op
MethodHandlesTableSwitchConstant.testSwitch              10       150       N/A
avgt   30   4.126 � 0.025  ms/op
MethodHandlesTableSwitchConstant.testSwitch              25         0       N/A
avgt   30   4.137 � 0.042  ms/op
MethodHandlesTableSwitchConstant.testSwitch              25       150       N/A
avgt   30   4.113 � 0.016  ms/op
MethodHandlesTableSwitchConstant.testSwitch              50         0       N/A
avgt   30   4.118 � 0.028  ms/op
MethodHandlesTableSwitchConstant.testSwitch              50       150       N/A
avgt   30   4.127 � 0.019  ms/op
MethodHandlesTableSwitchConstant.testSwitch             100         0       N/A
avgt   30   4.116 � 0.013  ms/op
MethodHandlesTableSwitchConstant.testSwitch             100       150       N/A
avgt   30   4.121 � 0.020  ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch           5         0       N/A
avgt   30   4.113 � 0.009  ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch           5       150       N/A
avgt   30   4.149 � 0.041  ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch          10         0       N/A
avgt   30   4.121 � 0.026  ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch          10       150       N/A
avgt   30   4.113 � 0.021  ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch          25         0       N/A
avgt   30   4.129 � 0.028  ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch          25       150       N/A
avgt   30   4.105 � 0.019  ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch          50         0       N/A
avgt   30   4.097 � 0.021  ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch          50       150       N/A
avgt   30   4.131 � 0.037  ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch         100         0       N/A
avgt   30   4.135 � 0.025  ms/op
MethodHandlesTableSwitchOpaqueSingle.testSwitch         100       150       N/A
avgt   30   4.139 � 0.145  ms/op
MethodHandlesTableSwitchRandom.testSwitch                 5         0      true
avgt   30   4.894 � 0.028  ms/op
MethodHandlesTableSwitchRandom.testSwitch                 5         0     false
avgt   30  11.526 � 0.194  ms/op
MethodHandlesTableSwitchRandom.testSwitch                 5       150      true
avgt   30   4.882 � 0.025  ms/op
MethodHandlesTableSwitchRandom.testSwitch                 5       150     false
avgt   30  11.532 � 0.034  ms/op
MethodHandlesTableSwitchRandom.testSwitch                10         0      true
avgt   30   5.065 � 0.076  ms/op
MethodHandlesTableSwitchRandom.testSwitch                10         0     false
avgt   30  13.016 � 0.020  ms/op
MethodHandlesTableSwitchRandom.testSwitch                10       150      true
avgt   30   5.103 � 0.051  ms/op
MethodHandlesTableSwitchRandom.testSwitch                10       150     false
avgt   30  12.984 � 0.102  ms/op
MethodHandlesTableSwitchRandom.testSwitch                25         0      true
avgt   30   8.441 � 0.165  ms/op
MethodHandlesTableSwitchRandom.testSwitch                25         0     false
avgt   30  13.371 � 0.060  ms/op
MethodHandlesTableSwitchRandom.testSwitch                25       150      true
avgt   30   8.628 � 0.032  ms/op
MethodHandlesTableSwitchRandom.testSwitch                25       150     false
avgt   30  13.542 � 0.020  ms/op
MethodHandlesTableSwitchRandom.testSwitch                50         0      true
avgt   30   4.701 � 0.015  ms/op
MethodHandlesTableSwitchRandom.testSwitch                50         0     false
avgt   30  13.562 � 0.063  ms/op
MethodHandlesTableSwitchRandom.testSwitch                50       150      true
avgt   30   7.991 � 3.111  ms/op
MethodHandlesTableSwitchRandom.testSwitch                50       150     false
avgt   30  13.543 � 0.088  ms/op
MethodHandlesTableSwitchRandom.testSwitch               100         0      true
avgt   30   4.712 � 0.020  ms/op
MethodHandlesTableSwitchRandom.testSwitch               100         0     false
avgt   30  13.600 � 0.085  ms/op
MethodHandlesTableSwitchRandom.testSwitch               100       150      true
avgt   30   4.676 � 0.011  ms/op
MethodHandlesTableSwitchRandom.testSwitch               100       150     false
avgt   30  13.476 � 0.043  ms/op

Testing:
- [x] Running of included benchmarks
- [x] Inspecting inlining trace and verifying method handle targets are inlined
- [x] Running TestTableSwitch test (currently the only user of the new code)
- [x] Running java/lang/invoke tests (just in case)
- [x] Some manual testing

Thanks,
Jorn

-------------

Commit messages:
- Improve test
- Touchup
- Use cases array + holder
- WIP - implement tableSwitch combinator in lambda form interpreter

Changes: https://git.openjdk.java.net/jdk/pull/3401/files
Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3401&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8263087
  Stats: 959 lines in 8 files changed: 955 ins; 0 del; 4 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3401.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3401/head:pull/3401

PR: https://git.openjdk.java.net/jdk/pull/3401

Reply via email to