On 3/25/25 3:47 AM, Robin Dapp via Gcc wrote:
I am revisiting an effort to make the number of lanes for vector segment
load/store a tunable parameter.

A year ago, Robin added minimal and not-yet-tunable
common_vector_cost::segment_permute_[2-8]

But it is tunable, just not a param? :)  We have our own cost structure in our downstream repo, adjusted to our uarch.  I suggest you do the same or upstream a separate cost structure.  I don't think anybody would object to having several of those, one for each uarch (as long as they are sufficiently distinct).
Yea, strongly recommend this. From a user interface standpoint it's better to have the compiler select something sensible based on -mcpu or -mtune rather than having to specify a bunch of --params.

In general --params should be seen as knobs compilers developers use to adjust behavior, say change a limit. They're not really meant to be something we expect users to twiddle much.

For internal testing, sure, a param is great since it allows you to sweep through a bunch of values without rebuilding the toolchain. Once a sensible value is determined, you put that into the costing table for your uarch.




BTW, just tangentially related and I don't know how sensitive your uarch is to scheduling, but with the x264 SAD and other sched issues we have seen you might consider disabling sched1 as well for your uarch?  I know that for our uarch we want to keep it on but we surely could have another generic-like mtune option that disables it (maybe even generic- ooo and change the current generic-ooo to generic-in-order?).  I would expect this to get more common in the future anyway.
To expand a bit. The first scheduling pass is important for x264's SAD because by exposing the load->use latency, it will tend to create overlapping lifetimes for the pseudo registers holding the input vectors.

With the overlapping lifetimes the register allocator is then forced to allocate the pseudos to different physical registers, thus ensuring parallelism. The more out-of-order capable the target design is, the less useful sched1 will be.


Jeff

Reply via email to