On Fri, 3 Nov 2023 23:22:27 GMT, Claes Redestad <redes...@openjdk.org> wrote:

>> https://github.com/cassioneri/eaf suggest this code for leap year 
>> calculation:
>> 
>>     public static boolean isLeap(long year) {
>>         int d = year % 100 != 0 ? 4 : 16;
>>         return (year & (d - 1)) == 0;
>>     }
>> 
>> .. with a claim this would compile down to branchless, easily pipelined code.
>> 
>> This doesn't currently happen with C2. In the meantime I think we can 
>> improve the current code in `Year.isLeap` and `IsoChronology.isLeapYear` by 
>> leveraging the fact that the `% 100` check is only needed if `(year & 15) != 
>> 0`:
>> 
>> 
>>     public static boolean isLeap(long year) {
>>         return (year & 15) == 0 ? (year & 3) == 0 : (year & 3) == 0 && year 
>> % 100 != 0;
>>     }
>> 
>> 
>> Mac M1:
>> 
>> Name                           Cnt  Base   Error   Test   Error   Unit  
>> Change
>> LeapYearBench.isLeapYear        15 0,743 ± 0,009  0,994 ± 0,005 ops/us   
>> 1,34x (p = 0,000*)
>> LeapYearBench.isLeapYearChrono  15 0,748 ± 0,006  0,991 ± 0,003 ops/us   
>> 1,32x (p = 0,000*)
>> LeapYearBench.isLeapYearNS      15 0,558 ± 0,026  0,552 ± 0,033 ops/us   
>> 0,99x (p = 0,602 )
>>   * = significant
>> 
>> 
>> Linux x64:
>> 
>> Name                           Cnt  Base   Error   Test   Error   Unit  
>> Change
>> LeapYearBench.isLeapYear        15 0.534 ± 0.001  0.765 ± 0.004 ops/us   
>> 1.43x (p = 0.000*)
>> LeapYearBench.isLeapYearChrono  15 0.535 ± 0.000  0.753 ± 0.040 ops/us   
>> 1.41x (p = 0.000*)
>> LeapYearBench.isLeapYearNS      15 0.352 ± 0.000  0.351 ± 0.001 ops/us   
>> 1.00x (p = 0.000*)
>>   * = significant
>> 
>> 
>> 30% higher throughput on M1, 40% on x64. `isLeapYearNS` runs a variant of 
>> the code from https://github.com/cassioneri/eaf ported to java - perhaps the 
>> JIT can be improved to do whatever clang/gcc does here and achieve an even 
>> better speed-up.
>> 
>> Testing: so far only java/time/tck/java/time locally, will run a few tiers 
>> before filing an enhancement and opening the PR for review.
>
> Claes Redestad has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Apply similar optimization to GregorianCalendar, sun.util.calendar

Yes, seems Granlund & Montgomery is used, see 
https://github.com/openjdk/jdk/blob/01c0d5dd0a4f7587288219bad8ed4648f4e456ce/src/hotspot/share/opto/divnode.cpp#L161
 

I explored some more with micros that don't loop over different values but 
instead test all the year variants of interest in isolation:

Benchmark                          (year)  Mode  Cnt  Score   Error  Units
LeapYearBench.Single.isLeapYear      2000  avgt   15  0,590 ± 0,053  ns/op
LeapYearBench.Single.isLeapYear      2017  avgt   15  0,586 ± 0,030  ns/op
LeapYearBench.Single.isLeapYear      2004  avgt   15  0,936 ± 0,002  ns/op
LeapYearBench.Single.isLeapYear      2100  avgt   15  0,937 ± 0,002  ns/op
LeapYearBench.Single.isLeapYearNS    2000  avgt   15  2,117 ± 0,028  ns/op
LeapYearBench.Single.isLeapYearNS    2017  avgt   15  2,114 ± 0,025  ns/op
LeapYearBench.Single.isLeapYearNS    2004  avgt   15  2,115 ± 0,019  ns/op
LeapYearBench.Single.isLeapYearNS    2100  avgt   15  2,111 ± 0,019  ns/op


When isolating like this the suggested `(v & 15) == 0`-first approach still 
wins, and the generated code across the tests appear to be about as branchy. 

I suggest we go ahead and integrate this, file an RFE to re-examine the 
division-by-constant in C2, then re-evaluate these `isLeapYear` micros in that 
new environment.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16491#issuecomment-1793913973

Reply via email to