ping flow gg <hlefthl...@gmail.com> 于2024年1月30日周二 00:22写道:
> > I expect that it would be faster to make one large load, and then 4 small > > stores, but that might work only for exactly 128-bit vectors? > > This seems to require vle128, so I didn't modify it. > > > That's not needed. You can use immediate values. > > You can reorder to avoid immediate data dependencies on the addresses. > > In any case, you need to check the vector length in init. > > Okay, I've updated it in the reply. > > Rémi Denis-Courmont <r...@remlab.net> 于2024年1月29日周一 23:41写道: > >> Hi, >> >> +/* >> + * Copyright (c) 2023 Institue of Software Chinese Academy of Sciences >> (ISCAS). >> + * >> + * This file is part of FFmpeg. >> + * >> + * FFmpeg is free software; you can redistribute it and/or >> + * modify it under the terms of the GNU Lesser General Public >> + * License as published by the Free Software Foundation; either >> + * version 2.1 of the License, or (at your option) any later version. >> + * >> + * FFmpeg is distributed in the hope that it will be useful, >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >> + * Lesser General Public License for more details. >> + * >> + * You should have received a copy of the GNU Lesser General Public >> + * License along with FFmpeg; if not, write to the Free Software >> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA >> 02110-1301 >> USA >> + */ >> + >> +#include "libavutil/riscv/asm.S" >> + >> +func ff_get_pixels_8x4_sym_rvv, zve64x >> + vsetivli zero, 8, e8, mf2, ta, ma >> + vlse64.v v16, (a1), a2 >> + li t0, 8 * 8 >> + vsetvli zero, t0, e16, m4, ta, ma >> + vzext.vf2 v8, v16 >> + vse16.v v8, (a0) >> + li a2, 8*2 >> >> That's not needed. You can use immediate values. >> >> + vsetivli zero, 2, e8, mf8, ta, ma >> + addi a1, a0, 48 >> + addi a0, a0, 32*2 >> + vle64.v v0, (a1) >> + vse64.v v0, (a0) >> + sub a1, a1, a2 >> + vle64.v v0, (a1) >> + add a0, a0, a2 >> + vse64.v v0, (a0) >> + sub a1, a1, a2 >> + vle64.v v0, (a1) >> + add a0, a0, a2 >> + vse64.v v0, (a0) >> + sub a1, a1, a2 >> + vle64.v v0, (a1) >> + add a0, a0, a2 >> + vse64.v v0, (a0) >> >> You can reorder to avoid immediate data dependencies on the addresses. >> >> I expect that it would be faster to make one large load, and then 4 small >> stores, but that might work only for exactly 128-bit vectors? >> >> In any case, you need to check the vector length in init. >> >> + >> + ret >> +endfunc >> >> -- >> 雷米‧德尼-库尔蒙 >> http://www.remlab.net/ >> >> >> >> _______________________________________________ >> ffmpeg-devel mailing list >> ffmpeg-devel@ffmpeg.org >> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel >> >> To unsubscribe, visit link above, or email >> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". >> > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".