On Thu, Sep 25, 2025 at 5:36 PM 钟居哲 <[email protected]> wrote: > Since Originally I designed the VSETVL PASS is able to do fusion under this > condition: > > insn 1: TAIL_ANY > insn 2: TAIL_UNDISTURBED > > fuse successfully -> TAIL_UNDISTURBED
For OOO uarch, we don't want fusion in this case as it would unnecessarily apply the tu policy to many subsequent vector operations, leading to performance inefficiency. We only want `vsetvli ... tu` before operations that must use this policy, even if this results in more vset instructions. After applying this patch, our uarch testing shows a 30% performance uplift for SPEC2017 510.parest_r. BTW, I've set it to false for all in-order cores and true for all out-of-order cores in tune_info.
