Thanks Jeff for comments.

It makes sense to me. For the EQ operator we should have CONSTM1. Does this 
mean s390 parts has similar issue here? Then for instructions like VMSEQ, we 
need to adjust the simplify_rtx up to a point.

Please help to correct me if any mistake. Thank you again.

Pan

-----Original Message-----
From: Jeff Law <jeffreya...@gmail.com> 
Sent: Saturday, April 29, 2023 5:48 AM
To: Li, Pan2 <pan2...@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@sifive.com; Wang, Yanzhang 
<yanzhang.w...@intel.com>
Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET



On 4/28/23 09:21, Pan Li via Gcc-patches wrote:
> From: Pan Li <pan2...@intel.com>
> 
> When some RVV integer compare operators act on the same vector 
> registers without mask. They can be simplified to VMSET.
> 
> This PATCH allows the eq, le, leu, ge, geu to perform such kind of the 
> simplification by adding one macro in riscv for simplify rtx.
> 
> Given we have:
> vbool1_t test_shortcut_for_riscv_vmseq_case_0(vint8m8_t v1, size_t vl) 
> {
>    return __riscv_vmseq_vv_i8m8_b1(v1, v1, vl); }
> 
> Before this patch:
> vsetvli  zero,a2,e8,m8,ta,ma
> vl8re8.v v8,0(a1)
> vmseq.vv v8,v8,v8
> vsetvli  a5,zero,e8,m8,ta,ma
> vsm.v    v8,0(a0)
> ret
> 
> After this patch:
> vsetvli zero,a2,e8,m8,ta,ma
> vmset.m v1                  <- optimized to vmset.m
> vsetvli a5,zero,e8,m8,ta,ma
> vsm.v   v1,0(a0)
> ret
> 
> As above, we may have one instruction eliminated and require less 
> vector registers.
> 
> Signed-off-by: Pan Li <pan2...@intel.com>
> 
> gcc/ChangeLog:
> 
>       * config/riscv/riscv.h (VECTOR_STORE_FLAG_VALUE): Add new macro
>         consumed by simplify_rtx.
> 
> gcc/testsuite/ChangeLog:
> 
>       * gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c:
>         Adjust test check condition.
I'm not sure this is 100% correct.

What happens to the high bits in the resultant mask register?  My understanding 
is we have one output bit per input element in the comparison.  So unless the 
number of elements matches the bit width of the mask register, this isn't going 
to work.

Am I missing something?

Jeff


Reply via email to