On 03/12/2015 10:04 PM, Peter Levart wrote:
... putLongUnaligned in the style of above getLongUnaligned is more
tricky with current code structure. But there may be a middle ground
(or a sweet spot):
public final void putLongUnaligned(Object o, long offset, long x) {
if (((int) offset & 1) == 1) {
putLongParts(o, offset,
(byte) (x >>> 0),
(short) (x >>> 8),
(short) (x >>> 24),
(short) (x >>> 40),
(byte) (x >>> 56));
} else if (((int) offset & 2) == 2) {
putLongParts(o, offset,
(short)(x >>> 0),
(int)(x >>> 16),
(short)(x >>> 48));
} else if (((int) offset & 4) == 4) {
putLongParts(o, offset,
(int)(x >> 0),
(int)(x >>> 32));
} else {
putLong(o, offset, x);
}
}
...this has the same number of branches, but less instructions. You
also need the following two:
At least on Intel (with -XX:-UseUnalignedAccesses) above code
(Unaligned2) is not any faster then your code (Unaligned) according to a
JMH random-access test. Neither is the reversal of if/else branches
(Unaligned1). Unaligned3 is switch-based variant (just get) and is
slowest. Your variant seems to be the fastest by a hair:
Benchmark Mode Samples Score Score
error Units
j.t.UnalignedTest.getLongUnaligned avgt 5 16.375
0.837 ns/op
j.t.UnalignedTest.getLongUnaligned1 avgt 5 18.340
0.617 ns/op
j.t.UnalignedTest.getLongUnaligned2 avgt 5 16.784
0.969 ns/op
j.t.UnalignedTest.getLongUnaligned3 avgt 5 19.634
0.871 ns/op
j.t.UnalignedTest.putLongUnaligned avgt 5 15.521
0.589 ns/op
j.t.UnalignedTest.putLongUnaligned1 avgt 5 16.676
1.042 ns/op
j.t.UnalignedTest.putLongUnaligned2 avgt 5 16.394
3.028 ns/op
Regards, Peter
Peter