So, very few deopt events in the logs (exactly 4 in fact, in both the
performant and non-performant cases, and for the exact same methods), but
in the case where performance has degraded I only see an initial
compilation for the problem method and not the later inlining I see in the
performant case. I’ll dig through the rest of the logs and try see if
there’s any differences leading up to the inlining.

On the bright side while going through the logs I did spot one obvious
snafu in our code (unnecessary MutableCallSite usage), and have got a 2.5
times speed up on another benchmark, so I’m not too unhappy. :-)

On 20/01/2015 17:14, "MacGregor, Duncan (GE Energy Management)"
<duncan.macgre...@ge.com> wrote:

>Hmm, 8068915 hasn’t fixed it, but running fewer benchmarks seems to make
>the problem go away, so it looks like there’s something going wrong fairly
>deep in our runtime. Trying the full suite with compilation logging
>enabled now to see if I can find a smoking gun.
>
>On 20/01/2015 12:40, "Vladimir Ivanov" <vladimir.x.iva...@oracle.com>
>wrote:
>
>>Duncan, thanks a lot for giving it a try!
>>
>>If you plan to spend more time on it, please, apply 8068915 as well. I
>>saw huge intermittent performance regressions due to continuous
>>deoptimization storm. You can look into -XX:+LogCompilation output and
>>look for repeated deoptimization events in steady state w/ Action_none.
>>Also, there's deoptimization statistics in the log (at least, in jdk9).
>>It's located right before compilation_log tag.
>>
>>Thanks again for the valuable feedback!
>>
>>Best regards,
>>Vladimir Ivanov
>>
>>[1] http://cr.openjdk.java.net/~vlivanov/8068915/webrev.00
>>
>>On 1/19/15 11:21 PM, MacGregor, Duncan (GE Energy Management) wrote:
>>> Okay, I¹ve done some tests of this with the micro benchmarks for our
>>> language & runtime which show pretty much no change except for one test
>>> which is now almost 3x slower. It uses nested loops to iterate over an
>>> array and concatenate the string-like objects it contains, and replaces
>>> elements with these new longer string-llike objects. It¹s a bit of a
>>> pathological case, and I haven¹t seen the same sort of degradation in
>>>the
>>> other benchmarks or in real applications, but I haven¹t done serious
>>> benchmarking of them with this change.
>>>
>>> I shall see if the test case can be reduced down to anything simpler
>>>while
>>> still showing the same performance behaviour, and try add some
>>>compilation
>>> logging options to narrow down what¹s going on.
>>>
>>> Duncan.
>>>
>>> On 16/01/2015 17:16, "Vladimir Ivanov" <vladimir.x.iva...@oracle.com>
>>> wrote:
>>>
>>>> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/
>>>> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/
>>>> https://bugs.openjdk.java.net/browse/JDK-8063137
>>>>
>>>> After GuardWithTest (GWT) LambdaForms became shared, profile pollution
>>>> significantly distorted compilation decisions. It affected inlining
>>>>and
>>>> hindered some optimizations. It causes significant performance
>>>> regressions for Nashorn (on Octane benchmarks).
>>>>
>>>> Inlining was fixed by 8059877 [1], but it didn't cover the case when a
>>>> branch is never taken. It can cause missed optimization opportunity,
>>>>and
>>>> not just increase in code size. For example, non-pruned branch can
>>>>break
>>>> escape analysis.
>>>>
>>>> Currently, there are 2 problems:
>>>>    - branch frequencies profile pollution
>>>>    - deoptimization counts pollution
>>>>
>>>> Branch frequency pollution hides from JIT the fact that a branch is
>>>> never taken. Since GWT LambdaForms (and hence their bytecode) are
>>>> heavily shared, but the behavior is specific to MethodHandle, there's
>>>>no
>>>> way for JIT to understand how particular GWT instance behaves.
>>>>
>>>> The solution I propose is to do profiling in Java code and feed it to
>>>> JIT. Every GWT MethodHandle holds an auxiliary array (int[2]) where
>>>> profiling info is stored. Once JIT kicks in, it can retrieve these
>>>> counts, if corresponding MethodHandle is a compile-time constant (and
>>>>it
>>>> is usually the case). To communicate the profile data from Java code
>>>>to
>>>> JIT, MethodHandleImpl::profileBranch() is used.
>>>>
>>>> If GWT MethodHandle isn't a compile-time constant, profiling should
>>>> proceed. It happens when corresponding LambdaForm is already shared,
>>>>for
>>>> newly created GWT MethodHandles profiling can occur only in native
>>>>code
>>>> (dedicated nmethod for a single LambdaForm). So, when compilation of
>>>>the
>>>> whole MethodHandle chain is triggered, the profile should be already
>>>> gathered.
>>>>
>>>> Overriding branch frequencies is not enough. Statistics on
>>>> deoptimization events is also polluted. Even if a branch is never
>>>>taken,
>>>> JIT doesn't issue an uncommon trap there unless corresponding bytecode
>>>> doesn't trap too much and doesn't cause too many recompiles.
>>>>
>>>> I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT
>>>> sees it on some method, Compile::too_many_traps &
>>>> Compile::too_many_recompiles for that method always return false. It
>>>> allows JIT to prune the branch based on custom profile and recompile
>>>>the
>>>> method, if the branch is visited.
>>>>
>>>> For now, I wanted to keep the fix very focused. The next thing I plan
>>>>to
>>>> do is to experiment with ignoring deoptimization counts for other
>>>> LambdaForms which are heavily shared. I already saw problems caused by
>>>> deoptimization counts pollution (see JDK-8068915 [2]).
>>>>
>>>> I plan to backport the fix into 8u40, once I finish extensive
>>>> performance testing.
>>>>
>>>> Testing: JPRT, java/lang/invoke tests, nashorn (nashorn testsuite,
>>>> Octane).
>>>>
>>>> Thanks!
>>>>
>>>> PS: as a summary, my experiments show that fixes for 8063137 & 8068915
>>>> [2] almost completely recovers peak performance after LambdaForm
>>>>sharing
>>>> [3]. There's one more problem left (non-inlined MethodHandle
>>>>invocations
>>>> are more expensive when LFs are shared), but it's a story for another
>>>>day.
>>>>
>>>> Best regards,
>>>> Vladimir Ivanov
>>>>
>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8059877
>>>>      8059877: GWT branch frequencies pollution due to LF sharing
>>>> [2] https://bugs.openjdk.java.net/browse/JDK-8068915
>>>> [3] https://bugs.openjdk.java.net/browse/JDK-8046703
>>>>      JEP 210: LambdaForm Reduction and Caching
>>>> _______________________________________________
>>>> mlvm-dev mailing list
>>>> mlvm-dev@openjdk.java.net
>>>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>>>
>>> _______________________________________________
>>> mlvm-dev mailing list
>>> mlvm-dev@openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>>>
>>_______________________________________________
>>mlvm-dev mailing list
>>mlvm-dev@openjdk.java.net
>>http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
>_______________________________________________
>mlvm-dev mailing list
>mlvm-dev@openjdk.java.net
>http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

_______________________________________________
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

Reply via email to