On Thu, 1 Apr 2021 13:25:05 GMT, Jorn Vernee <[email protected]> wrote:
> This patch speeds up MethodHandle.asCollector handles where the array type is
> not Object[], as well as speeding up all collectors where the arity is
> greater than 10.
>
> The old code is creating a collector handle by combining a set of hard coded
> methods for collecting arguments into an Object[], up to a maximum of ten
> elements, and then copying this intermediate array into a final array.
>
> In principle, it shouldn't matter how slow (or fast) this handle is, because
> it should be replaced by existing bytecode intrinsification which does the
> right thing. But, through investigation it turns out that the
> intrinsification is only being applied in a very limited amount of cases:
> Object[] with max 10 elements only, only for the intermediate collector
> handles. Every other collector shape incurs overhead because it essentially
> ends up calling the ineffecient fallback handle.
>
> Rather than sticking on a band aid (I tried this, but it turned out to be
> very tricky to untangle the existing code), the new code replaces the
> existing implementation with a collector handle implemented using a
> LambdaForm, which removes the need for intrinsification, and also greatly
> reduces code-complexity of the implementation. (I plan to expose this
> collector using a public API in the future as well, so users don't have to go
> through MHs::identity to make a collector).
>
> The old code was also using a special lambda form transform for collecting
> arguments into an array. I believe this was done to take advantage of the
> existing-but-limited bytecode intrinsification, at least for Object[] with
> less than 10 elements.
>
> The new code just uses the existing collect arguments transform with the
> newly added collector handle as filter, and this works just as well for the
> existing case, but as a bonus is also much simpler, since no separate
> transform is needed. Using the collect arguments transform should also
> improve sharing.
>
> As a result of these changes a lot of code was unused and has been removed in
> this patch.
>
> Testing: tier 1-3, benchmarking using TypedAsCollector (part of the patch),
> as well as another variant of the benchmark that used a declared static
> identity method instead of MHs::identity (not included). Before/after
> comparison of MethodHandleAs* benchmarks (no differences there).
>
> Here are some numbers from the added benchmark:
>
> Before:
> Benchmark Mode Cnt Score Error
> Units
> TypedAsCollector.testIntCollect avgt 30 189.156 � 1.796
> ns/op
> TypedAsCollector.testIntCollectHighArity avgt 30 660.549 � 10.159
> ns/op
> TypedAsCollector.testObjectCollect avgt 30 7.092 � 0.042
> ns/op
> TypedAsCollector.testObjectCollectHighArity avgt 30 65.225 � 0.546
> ns/op
> TypedAsCollector.testStringCollect avgt 30 28.511 � 0.243
> ns/op
> TypedAsCollector.testStringCollectHighArity avgt 30 57.054 � 0.635
> ns/op
> (as you can see, just the Object[] with arity less than 10 case is fast here)
> After:
> Benchmark Mode Cnt Score Error Units
> TypedAsCollector.testIntCollect avgt 30 6.569 � 0.131 ns/op
> TypedAsCollector.testIntCollectHighArity avgt 30 8.923 � 0.066 ns/op
> TypedAsCollector.testObjectCollect avgt 30 6.813 � 0.035 ns/op
> TypedAsCollector.testObjectCollectHighArity avgt 30 9.718 � 0.066 ns/op
> TypedAsCollector.testStringCollect avgt 30 6.737 � 0.016 ns/op
> TypedAsCollector.testStringCollectHighArity avgt 30 9.618 � 0.052 ns/op
>
> Thanks,
> Jorn
That's an elegant solution.
At first i thought it might unduly perturb lambda form generation and caching.
but you slotted a different lambda form implementation underneath the varargs
implementation.
src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 1277:
> 1275: if (intrinsicName == Intrinsic.IDENTITY) {
> 1276: MethodType resultType =
> type().asCollectorType(arrayType, type().parameterCount() - 1, arrayLength);
> 1277: MethodHandle collector =
> MethodHandleImpl.makeCollector(arrayType, arrayLength);
Should `MethodHandleImpl.varargsArray` still be used here since it performs
arity checking and caching?
Maybe the arity checks are performed by `asCollectorType`, but that would still
leave the caching.
-------------
PR: https://git.openjdk.java.net/jdk/pull/3306