load reordering

Jeff Law Thu, 20 Nov 2025 08:40:51 -0800



On 11/20/25 9:37 AM, Richard Biener wrote:

Am 20.11.2025 um 16:53 schrieb Jeff Law <[email protected]>:



On 11/19/25 9:42 AM, Andi Kleen wrote:

I know I was pushing for it to be enabled more widely as it's painfully hard
to forward from a narrow store to a wider load.  But based on earlier
discussions I've backed off that position.

FWIW I would expect any slightly better OOO core aimed at general
purpose code to have some form of hardware support for a subset of the
cases.

The narrow store to wide load is the problem space, even for OOO cores. I fully 
expect any modern performance core to forward when the load can get all of its 
data from a single prior store.

The rules can be very complicated. As an example see the diagram
in https://chipsandcheese.com/p/a-peek-at-sapphire-rapids
https://substackcdn.com/image/fetch/$s_!rESw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b17f38-0631-424d-8e05-7988f9b174f6_2559x1214.png

They don't look significantly more complex than I expected.  Essentially if the 
load is contained within the store, then it's forwarded, with a possible 
penalty if there isn't a perfect start match, but it's still forwarded.

If there's a partial overlap then no store to load forwarding occurs and you 
take that full 19c penalty.


There’s the strategy of increasing the issue distance between store and load.  
Some OOO implementations now try to anticipate and delay a load.  The compiler 
could do its own thing here during scheduling (usually to the contrary goal of 
delaying stores and issueing loads as early as possible).

Yup. Memory dependence predictors based on TAGE should be commonplacegoing forward.


Jeff

Re: [PATCH v3 2/5] asf: Add cost function for store/load reordering

Reply via email to