I think in most cases it is like this, but specifically for this function, using Reduction only once would be slower.
The currently submitted version roughly takes: pix_abs_0_0_rvv_i32: 136.2 The version that uses Reduction only once takes: pix_abs_0_0_rvv_i32: 169.2 Here is the implementation of the version that uses it only once: func ff_pix_abs16_temp_rvv, zve32x vsetivli zero, 16, e32, m4, ta, ma vmv.v.i v24, 0 vmv.s.x v0, zero 1: vsetvli zero, zero, e8, m1, tu, ma vle8.v v4, (a1) vle8.v v12, (a2) addi a4, a4, -1 vwsubu.vv v16, v4, v12 add a1, a1, a3 vwsubu.vv v20, v12, v4 vsetvli zero, zero, e16, m2, tu, ma vmax.vv v16, v16, v20 add a2, a2, a3 vwadd.wv v24, v24, v16 bnez a4, 1b vsetvli zero, zero, e32, m4, ta, ma vwredsumu.vs v0, v24, v0 vmv.x.s a0, v0 ret endfunc Rémi Denis-Courmont <r...@remlab.net> 于2024年2月7日周三 00:58写道: > Hi, > > To sum a vector, you should only reduce once at the end of the function, > c.f. > how it's done in existing scalar products. Reduction instructions are > (intrinsically) slow. > > -- > Rémi Denis-Courmont > http://www.remlab.net/ > > > > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".