https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832
--- Comment #10 from Alexander Monakov <amonakov at gcc dot gnu.org> --- (In reply to Jan Hubicka from comment #9) > Actually for older cores I think the manufacturers do not care much. I > still have a working Bulldozer machine and I can do some testing. > I think in Buldozer case I was basing the latency throughput on data in > Agner Fog's manuals. Ahhh, how could I forget that his manuals have data for those cores too. Thanks for the reminder! This solves the conundrum nicely: AMD Jaguar ('btver2' in GCC): int/fp division is not pipelined, separate int/fp dividers; AMD Bulldozer, Steamroller ('bdver1', 'bdver3'): int division is not pipelined (one divider), fp division is slightly pipelined (two independent dividers); Zhaoxin Lujiazui appears to use the same divider as VIA Nano 3000, which is not pipelined. So it's already enough to produce a decent patch. > How do you test it? For AMD Zen patches I was using measurements by Andreas Abel ( https://uops.info/table_overview.html ) and running a few experiments myself by coding loops in NASM and timing them with 'perf stat' on a Zen 2 CPU.