On 03/12/2015 07:37 PM, Andrew Haley wrote:
On 03/12/2015 05:15 PM, Peter Levart wrote:
...or are JIT+CPU smart enough and there would be no difference?
C2 always orders things based on profile counts, so there is no
difference. Your suggestion would be better for interpreted code
and I guess C1 also, so I agree it is worthwhile.
Thanks,
Andrew.
What about the following variant (or similar with ifs in case switch is
sub-optimal):
public final long getLongUnaligned(Object o, long offset) {
switch ((int) offset & 7) {
case 1:
case 5: return
(toUnsignedLong(getByte(o, offset)) << pickPos(56, 0)) |
(toUnsignedLong(getShort(o, offset + 1)) << pickPos(48,
8)) |
(toUnsignedLong(getInt(o, offset + 3)) << pickPos(32,
24)) |
(toUnsignedLong(getByte(o, offset + 7)) << pickPos(56,
56));
case 2:
case 6: return
(toUnsignedLong(getShort(o, offset)) << pickPos(48, 0)) |
(toUnsignedLong(getInt(o, offset + 2)) << pickPos(32,
16)) |
(toUnsignedLong(getShort(o, offset + 6)) << pickPos(48,
48));
case 3:
case 7: return
(toUnsignedLong(getByte(o, offset)) << pickPos(56, 0)) |
(toUnsignedLong(getInt(o, offset + 1)) << pickPos(32, 8)) |
(toUnsignedLong(getShort(o, offset + 5)) << pickPos(48,
40)) |
(toUnsignedLong(getByte(o, offset + 7)) << pickPos(56,
56));
case 4: return
(toUnsignedLong(getInt(o, offset)) << pickPos(32, 0)) |
(toUnsignedLong(getInt(o, offset + 4)) << pickPos(32, 32));
case 0:
default: return
getLong(o, offset);
}
}
...it may have more branches, but less instructions in average per call.
Peter