Thanks Andrew for your prompt reply.

The results below regard my PoC which is as close to the proposed patch as
I could make. This is because I can't have chrono with my patch on godbold
for a comparison between current chrono and patched chrono.

I tried on all platforms that I could make it to compile. Please double
check everything because I might be misreading some results, especially on
the platforms that I'm not familiar with. Sometimes godbold seems to have
issues and cut pieces of the generated assembly from the output. I've
marked these cases with (X).

I suspect we want to disable this for -Os
>

Below are the sizes with -Os. Most of the time the new code is shorter than
the old one with a few exceptions where they are the same size (because the
platform doesn't seem to support [[assume]]). The new code is never longer.
On each link, the middle panel shows the result for the old code and the
right panel for the new code. These panels have tabs for different
platforms.

                Old   New

https://godbolt.org/z/hfz9szEWf

x86-64         0x81  0x69
ARM32          0x78  0x68
ARM64          0x81  0x71
ARM64 Morello  0x48  0x48
HPPA           0xf8  0xc8
KVX ACB        0xec  0xcc
loongarch64    0x94  0x8c  (x)

https://godbolt.org/z/eMfzoPhT5

M68K           0xb6  0xa6
MinGW          0xa0  0x80  (X)
mips           0xdc  0xac
mips64         0xcc  0xb8
mipls64 (el)   0xbc  0xa8
mipsel         0xe0  0xb0
MRISC32        0xa4  0x74
power          0xb8  0x80

https://godbolt.org/z/PjqbTqK6b

power64        0xa8  0x8c
power64le      0xa4  0x88
RISC-V (32)    0x90  0x7e  (X)
RISC-V (64)    0x86  0x86  (X)
s390x          0xf0  0x90
sh             0xc2  0xb2
SPARC          0xc0  0x98
SPARC LEON     0xbc  0x94

https://godbolt.org/z/7oebGMYTM

SPARC64        0xac  0x94
TI C6x         0xc4  0x98
Tricore        0xb0  0xb0
VAX            0xc8  0xc5


> And plus i am not 100% convinced it is best for all micro-architures.
> Especially on say aarch64.
> Can you do more benchmarking and peocide which exaxt core is being used?
>

I don't have access to any platform other than x86-64 to do benchmarks :-(

And mention the size difference too?
>

Same exercise explained above but with -O2:

                 Old    New
https://godbolt.org/z/eqGo9xnz3

x86-64         0x a4  0x 72
ARM32          0x a8  0x 74
ARM64          0x 98  0x 80
ARM64 Morello  0x14c  0x14c
HPPA           0x134  0x c8
KVX ACB        0x f4  0x 98
loongarch64    0x ac  0x 9c  (X)

https://godbolt.org/z/7qh94zGMK

M68K           0x13a  0x a2
MinGW          0x d0  0x 80  (X)
mips           0x11c  0x e4
mips64         0x130  0x f0
mipls64 (el)   0x120  0x e0
mipsel         0x120  0x e8
MRISC32        0x a0  0x 74
power          0x dc  0x 88

https://godbolt.org/z/Y11Trnqc1

power64        0x d0  0x 94
power64le      0x d0  0x 90
RISC-V (32)    0x bc  0x 84  (X)
RISC-V (64)    0x be  0x 94  (X)
s390x          0x f0  0x a8
sh             0x c6  0x cc  (*)
SPARC          0x108  0x 9c
SPARC LEON     0x f4  0x 94

https://godbolt.org/z/h456PTEWh

SPARC64        0x c0  0x a0
TI C6x         0x108  0x b0
Tricore        0x e8  0x ea  (*)
VAX            0x dc  0x dc

(*) These are the only cases where the new code is larger than the old one.

Plus gcc knows how to do %7 via multiplication is that being used or is it
> due to generic x86 tuning it is using the div instruction?
>

Yes and no. In x86-64 (and probably many other platforms) the current
optimisation for n % 7 is a byproduct of the optimisation for /, that is,
to calculate n % 7, the generated code evaluates n - (n / 7) * 7. The
quotient q = n / 7 is optimised to avoid div and uses a multiplication and
other cheaper operations. In total it evaluates 2 multiplications + shifts
+ add + subs and movs. (One multiplication is q*7 which is performed with
LEA + sub.) The algorithm that I'm suggesting, performs only one
multiplication and one. Below are the comparisons of n % 7 and the proposed
algorithm.

https://godbolt.org/z/o7dazs4Gc
https://godbolt.org/z/zP79736WK
https://godbolt.org/z/65x7naMfq
https://godbolt.org/z/z9ofaMzex

I hope this helps.
Cassio.

Reply via email to