https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832

--- Comment #10 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #9)
> Actually for older cores I think the manufacturers do not care much.  I
> still have a working Bulldozer machine and I can do some testing.
> I think in Buldozer case I was basing the latency throughput on data in
> Agner Fog's manuals.

Ahhh, how could I forget that his manuals have data for those cores too. Thanks
for the reminder! This solves the conundrum nicely:

AMD Jaguar ('btver2' in GCC): int/fp division is not pipelined, separate int/fp
dividers;

AMD Bulldozer, Steamroller ('bdver1', 'bdver3'): int division is not pipelined
(one divider), fp division is slightly pipelined (two independent dividers);

Zhaoxin Lujiazui appears to use the same divider as VIA Nano 3000, which is not
pipelined.

So it's already enough to produce a decent patch.

> How do you test it?

For AMD Zen patches I was using measurements by Andreas Abel (
https://uops.info/table_overview.html ) and running a few experiments myself by
coding loops in NASM and timing them with 'perf stat' on a Zen 2 CPU.

Reply via email to