On 5/30/23 16:01, 钟居哲 wrote:
I agree with Andrew.
And I don't think this patch is appropriate for following reasons:
1. This patch increases vector workload in machine since
it convert scalar load + vmv.v.x into vmv.v.i + vsll.vi.
This is probably uarch dependent. I can probably construct cases where
the first will be better and I can probably construct cases where the
latter will be better. In fact the recommendation from our uarch team
is to generally do this stuff on the vector side.
2. For multi-issue OoO machine, scalar instructions are very cheap
when they are located in vector codegen. For example a sequence
like this:
scalar insn
scalar insn
vector insn
scalar insn
vector insn
....
In such situation, we can issue multiple instructions simultaneously,
and the latency of scalar instructions will be hided so scalar
instruction
is cheap. Wheras this patch increasing vector pipeline workload
is not
friendly to OoO machine what I mentioned above.
I probably need to be careful what I say here :-) I'll go with mixing
vector/scalar code may incur certain penalties on some
microarchitectures depending on the exact code sequences involved.
3. I can image the only benefit of this patch is that we can reduce
scalar register pressure
in some extreme circumstances. However, I don't this benefit is
"real" since GCC should
well schedule the instruction sequence when we well tune the
vector instructions scheduling
model and cost model to make such register live range very short
when the scalar register
pressure is very high.
Overal, I disagree with this patch.
What I think this all argues is that it'll likely need to be uarch
dependent. I'm not yet sure how to describe the properties of the
uarch in a concise manner to put into our costing structure yet though.
jeff