Hi Juzhe,

> ...
>        vsetvli zero,t1,e8,m1,ta,ma
>         vle8.v  v1,0(a4)
>         vsetvli t3,zero,e16,m2,ta,ma
>         vsext.vf2       v6,v1
>         vsetvli zero,t1,e8,m1,ta,ma
>         vle8.v  v1,0(a5)
>         vsetvli t3,zero,e16,m2,ta,ma
>         add     t0,a0,t4
>         vzext.vf2       v4,v1
>         vmul.vv v2,v4,v6
>         vsetvli zero,t1,e16,m2,ta,ma
>         vse16.v v2,0(t0)
>         vle8.v  v1,0(a6)
>         vsetvli t3,zero,e16,m2,ta,ma
>         add     t0,a1,t4
>         vzext.vf2       v2,v1
>         vmul.vv v4,v2,v4
>         vsetvli zero,t1,e16,m2,ta,ma
>         vse16.v v4,0(t0)
>         vsetvli t3,zero,e16,m2,ta,ma
>         add     t0,a2,t4
>         vmul.vv v2,v2,v6
>         vsetvli zero,t1,e16,m2,ta,ma
>         vse16.v v2,0(t0)
>         add     t0,a3,t4
>         vle8.v  v1,0(a7)
>         vsetvli t3,zero,e16,m2,ta,ma
>         sub     t6,t6,t1
>         vsext.vf2       v2,v1
>         vmul.vv v2,v2,v6
>         vsetvli zero,t1,e16,m2,ta,ma
>         vse16.v v2,0(t0)
> ...
> 
> After this patch:
> ...
>       vsetvli zero,t1,e8,mf2,ta,ma
>         vle8.v  v1,0(a4)
>         vle8.v  v3,0(a5)
>         vsetvli t6,zero,e8,mf2,ta,ma
>         add     t0,a0,t3
>         vwmulsu.vv      v2,v1,v3
>         vsetvli zero,t1,e16,m1,ta,ma
>         vse16.v v2,0(t0)
>         vle8.v  v2,0(a6)
>         vsetvli t6,zero,e8,mf2,ta,ma
>         add     t0,a1,t3
>         vwmulu.vv       v4,v3,v2
>         vsetvli zero,t1,e16,m1,ta,ma
>         vse16.v v4,0(t0)
>         vsetvli t6,zero,e8,mf2,ta,ma
>         add     t0,a2,t3
>         vwmulsu.vv      v3,v1,v2
>         vsetvli zero,t1,e16,m1,ta,ma
>         vse16.v v3,0(t0)
>         add     t0,a3,t3
>         vle8.v  v3,0(a7)
>         vsetvli t6,zero,e8,mf2,ta,ma
>         sub     t4,t4,t1
>         vwmul.vv        v2,v1,v3
>         vsetvli zero,t1,e16,m1,ta,ma
>         vse16.v v2,0(t0)
> ...

I like the code examples in general but find them hard to read
at lengths > 5-10 or so.  Could we condense this a bit?

> +(include "autovec-opt.md")
ACK for this.  We discussed before that not cluttering the regular
autovec.md with combine-targeted patterns too much so I'm in favor
of the separate file.

In total looks good to me.  I'm a bit wary about getting the costs
right for combine patterns but we can deal with this later.

Regards
 Robin

Reply via email to