On Wed, 18 May 2022 14:59:33 GMT, Quan Anh Mai <d...@openjdk.java.net> wrote:
>> Hi, >> >> This patch optimises the matching rules for floating-point comparison with >> respects to eq/ne on x86-64 >> >> 1, When the inputs of a comparison is the same (i.e `isNaN` patterns), `ZF` >> is always set, so we don't need `cmpOpUCF2` for the eq/ne cases, which >> improves the sequence of `If (CmpF x x) (Bool ne)` from >> >> ucomiss xmm0, xmm0 >> jp label >> jne label >> >> into >> >> ucomiss xmm0, xmm0 >> jp label >> >> 2, The move rules for `cmpOpUCF2` is missing, which makes patterns such as >> `x == y ? 1 : 0` to fall back to `cmpOpU`, which have a really high cost of >> fixing the flags, such as >> >> xorl ecx, ecx >> ucomiss xmm0, xmm1 >> jnp done >> pushf >> andq [rsp], 0xffffff2b >> popf >> done: >> movl eax, 1 >> cmovel eax, ecx >> >> The patch changes this sequence into >> >> xorl ecx, ecx >> ucomiss xmm0, xmm1 >> movl eax, 1 >> cmovpl eax, ecx >> cmovnel eax, ecx >> >> 3, The patch also changes the pattern of `isInfinite` to be more optimised >> by using `Math.abs` to reduce 1 comparison and compares the result with >> `MAX_VALUE` since `>` is more optimised than `==` for floating-point types. >> >> The benchmark results are as follow: >> >> Before: >> Benchmark Mode Cnt Score Error Units >> FPComparison.equalDouble avgt 5 2876.242 ± 58.875 ns/op >> FPComparison.equalFloat avgt 5 3062.430 ± 31.371 ns/op >> FPComparison.isFiniteDouble avgt 5 475.749 ± 19.027 ns/op >> FPComparison.isFiniteFloat avgt 5 506.525 ± 14.417 ns/op >> FPComparison.isInfiniteDouble avgt 5 1232.800 ± 31.677 ns/op >> FPComparison.isInfiniteFloat avgt 5 1234.708 ± 70.239 ns/op >> FPComparison.isNanDouble avgt 5 2255.847 ± 7.238 ns/op >> FPComparison.isNanFloat avgt 5 2567.044 ± 36.078 ns/op >> >> After: >> Benchmark Mode Cnt Score Error Units >> FPComparison.equalDouble avgt 5 594.636 ± 8.922 ns/op >> FPComparison.equalFloat avgt 5 663.849 ± 3.656 ns/op >> FPComparison.isFiniteDouble avgt 5 518.309 ± 107.352 ns/op >> FPComparison.isFiniteFloat avgt 5 515.576 ± 14.669 ns/op >> FPComparison.isInfiniteDouble avgt 5 621.185 ± 11.935 ns/op >> FPComparison.isInfiniteFloat avgt 5 623.566 ± 15.206 ns/op >> FPComparison.isNanDouble avgt 5 400.124 ± 0.762 ns/op >> FPComparison.isNanFloat avgt 5 546.486 ± 1.509 ns/op >> >> Thank you very much. > > I have reverted the changes to `java.lang.Float` and `java.lang.Double` to > not interfere with the intrinsic PR. More tests are added to cover all cases > regarding floating-point comparison of compiled code. > > The rules for fp comparison that output the result to `rFlagRegsU` are > expensive and should be avoided. As a result, I removed the shortcut rules > with memory or constant operands to reduce the number of match rules. Only > the basic rules are kept. > > Thanks. @merykitty Very nice work! The patch looks good to me. @merykitty Very nice work! The patch looks good to me. ------------- PR: https://git.openjdk.java.net/jdk/pull/8525