https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99462

--- Comment #3 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
(for context, the above patch was for PR 98856, but it's based on incorrect
latency analysis, see bug 98856 comment #38 )

Right now schedulers cannot easily split instructions for that purpose, it
would require computing dependency graph more accurately. Right now
dependencies and priorities are computed with respect to instructions as a
whole, intelligent splitting would require tracking latencies with respect to
individual inputs.

sel-sched does not split, but it can perform "renaming" which basically
overcomes anti-dependencies by scheduling the desired instruction before the
conflicting write (by choosing a different output register), and a reg-reg move
later.

I think on modern x86 profitability of such splitting is quite dubious, because
it would increase the amount of instructions and uops flowing in the CPU
front-end and entering the renamer (which is one of narrowest points in the
pipeline). Especially on AMD, where not only load-op, but also load-op-store
instructions are renamed as a single uop (which is then sent to two or three
execution units).

I think in common cases where overall critical path is unchanged (like in given
examples of pinsrq and various load-op instruction) GCC should simply continue
emitting the combined form.

Reply via email to