On Fri, 3 Nov 2023 23:22:27 GMT, Claes Redestad <redes...@openjdk.org> wrote:
>> https://github.com/cassioneri/eaf suggest this code for leap year >> calculation: >> >> public static boolean isLeap(long year) { >> int d = year % 100 != 0 ? 4 : 16; >> return (year & (d - 1)) == 0; >> } >> >> .. with a claim this would compile down to branchless, easily pipelined code. >> >> This doesn't currently happen with C2. In the meantime I think we can >> improve the current code in `Year.isLeap` and `IsoChronology.isLeapYear` by >> leveraging the fact that the `% 100` check is only needed if `(year & 15) != >> 0`: >> >> >> public static boolean isLeap(long year) { >> return (year & 15) == 0 ? (year & 3) == 0 : (year & 3) == 0 && year >> % 100 != 0; >> } >> >> >> Mac M1: >> >> Name Cnt Base Error Test Error Unit >> Change >> LeapYearBench.isLeapYear 15 0,743 ± 0,009 0,994 ± 0,005 ops/us >> 1,34x (p = 0,000*) >> LeapYearBench.isLeapYearChrono 15 0,748 ± 0,006 0,991 ± 0,003 ops/us >> 1,32x (p = 0,000*) >> LeapYearBench.isLeapYearNS 15 0,558 ± 0,026 0,552 ± 0,033 ops/us >> 0,99x (p = 0,602 ) >> * = significant >> >> >> Linux x64: >> >> Name Cnt Base Error Test Error Unit >> Change >> LeapYearBench.isLeapYear 15 0.534 ± 0.001 0.765 ± 0.004 ops/us >> 1.43x (p = 0.000*) >> LeapYearBench.isLeapYearChrono 15 0.535 ± 0.000 0.753 ± 0.040 ops/us >> 1.41x (p = 0.000*) >> LeapYearBench.isLeapYearNS 15 0.352 ± 0.000 0.351 ± 0.001 ops/us >> 1.00x (p = 0.000*) >> * = significant >> >> >> 30% higher throughput on M1, 40% on x64. `isLeapYearNS` runs a variant of >> the code from https://github.com/cassioneri/eaf ported to java - perhaps the >> JIT can be improved to do whatever clang/gcc does here and achieve an even >> better speed-up. >> >> Testing: so far only java/time/tck/java/time locally, will run a few tiers >> before filing an enhancement and opening the PR for review. > > Claes Redestad has updated the pull request incrementally with one additional > commit since the last revision: > > Apply similar optimization to GregorianCalendar, sun.util.calendar Yes, seems Granlund & Montgomery is used, see https://github.com/openjdk/jdk/blob/01c0d5dd0a4f7587288219bad8ed4648f4e456ce/src/hotspot/share/opto/divnode.cpp#L161 I explored some more with micros that don't loop over different values but instead test all the year variants of interest in isolation: Benchmark (year) Mode Cnt Score Error Units LeapYearBench.Single.isLeapYear 2000 avgt 15 0,590 ± 0,053 ns/op LeapYearBench.Single.isLeapYear 2017 avgt 15 0,586 ± 0,030 ns/op LeapYearBench.Single.isLeapYear 2004 avgt 15 0,936 ± 0,002 ns/op LeapYearBench.Single.isLeapYear 2100 avgt 15 0,937 ± 0,002 ns/op LeapYearBench.Single.isLeapYearNS 2000 avgt 15 2,117 ± 0,028 ns/op LeapYearBench.Single.isLeapYearNS 2017 avgt 15 2,114 ± 0,025 ns/op LeapYearBench.Single.isLeapYearNS 2004 avgt 15 2,115 ± 0,019 ns/op LeapYearBench.Single.isLeapYearNS 2100 avgt 15 2,111 ± 0,019 ns/op When isolating like this the suggested `(v & 15) == 0`-first approach still wins, and the generated code across the tests appear to be about as branchy. I suggest we go ahead and integrate this, file an RFE to re-examine the division-by-constant in C2, then re-evaluate these `isLeapYear` micros in that new environment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16491#issuecomment-1793913973