On Mar 12, 2015, at 11:37 AM, Andrew Haley <a...@redhat.com> wrote: > > On 03/12/2015 05:15 PM, Peter Levart wrote: >> ...or are JIT+CPU smart enough and there would be no difference? > > C2 always orders things based on profile counts, so there is no > difference. Your suggestion would be better for interpreted code > and I guess C1 also, so I agree it is worthwhile.
Profile counts can partially reorganize decision trees, if they are unambiguous. The best effect from profiling is to prune untaken branches completely (leaving a deopt). The main caveat here is that this breaks down when the profile is ambiguous, which can happen when multiple users of a library routine "pollute" the profile with divergent behaviors. See (e.g.) slides 17-19 of: http://cr.openjdk.java.net/~jrose/pres/201502-JVMChallenges.pdf The JVM currently addresses this mainly by combining local profile data with type inference that crosses inline boundaries. The present case can perhaps be improved by type inference or non-local profiling on bitfields, which is partially discussed in: https://bugs.openjdk.java.net/browse/JDK-8001436 BTW, I like Peter's suggestion to perform localized merging of bytes to shorts (etc.) based on exact alignment. But, I'd rather see it done further down the pipeline, after vectorization. — John