Hello! I have a small patch to Integer and Long classes, which is speeding up bit reversion significantly.
Last two/three steps of bit reversion are doing byte reversion, so there is possibility to use intrinsified method reverseBytes. Java implementation of reverseBytes is similar to those steps, so it should give similar performance when intrinsics are not available. Here I have result of few performance tests (thank for hints, Aleksej:) : old code: # VM options: -server ReverseInt.reverse 1 avgt 5 8,766 ± 0,214 ns/op ReverseLong.reverse 1 avgt 5 9,992 ± 0,165 ns/op # VM options: -client ReverseInt.reverse 1 avgt 5 9,168 ± 0,268 ns/op ReverseLong.reverse 1 avgt 5 9,988 ± 0,123 ns/op patched: # VM options: -server ReverseInt.reverse 1 avgt 5 6,411 ± 0,046 ns/op ReverseLong.reverse 1 avgt 5 6,299 ± 0,158 ns/op # VM options: -client ReverseInt.reverse 1 avgt 5 6,430 ± 0,022 ns/op ReverseLong.reverse 1 avgt 5 6,301 ± 0,097 ns/op patched, intrinsics disabled: # VM options: -server -XX:DisableIntrinsic=_reverseBytes_i,_reverseBytes_l ReverseInt.reverse 1 avgt 5 9,597 ± 0,206 ns/op ReverseLong.reverse 1 avgt 5 9,966 ± 0,151 ns/op # VM options: -client -XX:DisableIntrinsic=_reverseBytes_i,_reverseBytes_l ReverseInt.reverse 1 avgt 5 9,609 ± 0,069 ns/op ReverseLong.reverse 1 avgt 5 9,968 ± 0,075 ns/op You can see, there is little slowdown in integer case with intrinsics disabled. It seems to be caused by different 'shape' of byte reverting code in Integer.reverse and Integer.reverseByte. I tried to replace reverseByte code with piece of reverse method, and it is as fast as not patched case: ReverseInt.reverse 1 avgt 5 9,184 ± 0,255 ns/op Diffs from jdk9: Integer.java: @@ -1779,9 +1805,8 @@ i = (i & 0x55555555) << 1 | (i >>> 1) & 0x55555555; i = (i & 0x33333333) << 2 | (i >>> 2) & 0x33333333; i = (i & 0x0f0f0f0f) << 4 | (i >>> 4) & 0x0f0f0f0f; - i = (i << 24) | ((i & 0xff00) << 8) | - ((i >>> 8) & 0xff00) | (i >>> 24); - return i; + + return reverseBytes(i); Long.java: @@ -1940,10 +1997,8 @@ i = (i & 0x5555555555555555L) << 1 | (i >>> 1) & 0x5555555555555555L; i = (i & 0x3333333333333333L) << 2 | (i >>> 2) & 0x3333333333333333L; i = (i & 0x0f0f0f0f0f0f0f0fL) << 4 | (i >>> 4) & 0x0f0f0f0f0f0f0f0fL; - i = (i & 0x00ff00ff00ff00ffL) << 8 | (i >>> 8) & 0x00ff00ff00ff00ffL; - i = (i << 48) | ((i & 0xffff0000L) << 16) | - ((i >>> 16) & 0xffff0000L) | (i >>> 48); - return i; + + return reverseBytes(i); Jaroslav