On 3/25/25 3:47 AM, Robin Dapp via Gcc wrote:
I am revisiting an effort to make the number of lanes for vector segment
load/store a tunable parameter.
A year ago, Robin added minimal and not-yet-tunable
common_vector_cost::segment_permute_[2-8]
But it is tunable, just not a param? :) We have our own cost structure
in our downstream repo, adjusted to our uarch. I suggest you do the
same or upstream a separate cost structure. I don't think anybody would
object to having several of those, one for each uarch (as long as they
are sufficiently distinct).
Yea, strongly recommend this. From a user interface standpoint it's
better to have the compiler select something sensible based on -mcpu or
-mtune rather than having to specify a bunch of --params.
In general --params should be seen as knobs compilers developers use to
adjust behavior, say change a limit. They're not really meant to be
something we expect users to twiddle much.
For internal testing, sure, a param is great since it allows you to
sweep through a bunch of values without rebuilding the toolchain. Once
a sensible value is determined, you put that into the costing table for
your uarch.
BTW, just tangentially related and I don't know how sensitive your uarch
is to scheduling, but with the x264 SAD and other sched issues we have
seen you might consider disabling sched1 as well for your uarch? I know
that for our uarch we want to keep it on but we surely could have
another generic-like mtune option that disables it (maybe even generic-
ooo and change the current generic-ooo to generic-in-order?). I would
expect this to get more common in the future anyway.
To expand a bit. The first scheduling pass is important for x264's SAD
because by exposing the load->use latency, it will tend to create
overlapping lifetimes for the pseudo registers holding the input vectors.
With the overlapping lifetimes the register allocator is then forced to
allocate the pseudos to different physical registers, thus ensuring
parallelism. The more out-of-order capable the target design is, the
less useful sched1 will be.
Jeff