Switches currently don't profile well (if at all) - John can shed more light on that as this came up on the compiler list a few weeks ago.
sent from my phone On Mar 12, 2015 6:06 PM, "Peter Levart" <peter.lev...@gmail.com> wrote: > > > On 03/12/2015 10:04 PM, Peter Levart wrote: > > ... putLongUnaligned in the style of above getLongUnaligned is more tricky > with current code structure. But there may be a middle ground (or a sweet > spot): > > > public final void putLongUnaligned(Object o, long offset, long x) { > if (((int) offset & 1) == 1) { > putLongParts(o, offset, > (byte) (x >>> 0), > (short) (x >>> 8), > (short) (x >>> 24), > (short) (x >>> 40), > (byte) (x >>> 56)); > } else if (((int) offset & 2) == 2) { > putLongParts(o, offset, > (short)(x >>> 0), > (int)(x >>> 16), > (short)(x >>> 48)); > } else if (((int) offset & 4) == 4) { > putLongParts(o, offset, > (int)(x >> 0), > (int)(x >>> 32)); > } else { > putLong(o, offset, x); > } > } > > > ...this has the same number of branches, but less instructions. You also > need the following two: > > > > At least on Intel (with -XX:-UseUnalignedAccesses) above code (Unaligned2) > is not any faster then your code (Unaligned) according to a JMH > random-access test. Neither is the reversal of if/else branches > (Unaligned1). Unaligned3 is switch-based variant (just get) and is slowest. > Your variant seems to be the fastest by a hair: > > Benchmark Mode Samples Score Score > error Units > j.t.UnalignedTest.getLongUnaligned avgt 5 16.375 > 0.837 ns/op > j.t.UnalignedTest.getLongUnaligned1 avgt 5 18.340 > 0.617 ns/op > j.t.UnalignedTest.getLongUnaligned2 avgt 5 16.784 > 0.969 ns/op > j.t.UnalignedTest.getLongUnaligned3 avgt 5 19.634 > 0.871 ns/op > j.t.UnalignedTest.putLongUnaligned avgt 5 15.521 > 0.589 ns/op > j.t.UnalignedTest.putLongUnaligned1 avgt 5 16.676 > 1.042 ns/op > j.t.UnalignedTest.putLongUnaligned2 avgt 5 16.394 > 3.028 ns/op > > > Regards, Peter > > Peter > >