AMD started shipping popcnt instruction in 2007, and intel in 2008. Does the OpenJDK runtime include a bitCount intrinsic? I can't speak much to relevance in this regard.
But in terms of performance, http://dalkescientific.com/writings/diary/archive/2011/11/02/faster_popcount_update.html gives a 45% speedup from current jdk algorithm (called "Wikipedia #2") to the propsed one ("Wikipedia #3"). Benchmark source: http://dalkescientific.com/writings/diary/popcnt.cpp The multiply is also the suggested algorithm in the linked PDF (page 195 of http://support.amd.com/techdocs/25112.pdf), an official AMD optimization guide. Isaac On Mon, May 8, 2017 at 4:29 PM, Doug Lea <d...@cs.oswego.edu> wrote: > On 05/08/2017 02:14 PM, Isaac Levy wrote: >> >> Original message below: >> >> The JDK impl of bitCount can be improved -- though most users will get >> the hotspot intrinsic. The best source I could find for the suggestion >> below is page 195: http://support.amd.com/techdocs/25112.pdf >> > > The int version differs from current implementation in that it uses > one multiply instead of two shift+adds. (Similarly for long.) > > I wonder if there is any processor that does not already have a > bit-count instruction for which this is actually faster? > Some evidence either way would be helpful. > > -Doug > > > > >> Cheers, >> Isaac Levy >> >> >> Proposed Long.bitCount and Integer.bitCount: >> >> >> public static int bitCount(long i) >> { >> i -= (i >>> 1) & 0x5555555555555555L; >> i = (i & 0x3333333333333333L) + ((i >>> 2) & 0x3333333333333333L); >> i = (i + (i >>> 4)) & 0x0f0f0f0f0f0f0f0fL; >> return (i * 0x0101010101010101L) >>> 56; >> } >> >> >> public static int bitCount(int i) { >> i -= (i >>> 1) & 0x55555555; >> i = (i & 0x33333333) + ((i >>> 2) & 0x33333333); >> i = (i + (i >>> 4)) & 0x0f0f0f0f; >> return (i * 0x01010101) >>> 24; >> } >> >