Nihao, Le 14 novembre 2025 03:52:51 GMT+02:00, yunfei_zhou--- via ffmpeg-devel <[email protected]> a écrit : >Before proceeding, we would like to understand whether there are any existing >or ongoing efforts in this area to avoid duplication and, ideally, align or >collaborate with current initiatives.
Existing code you can find in the official Git repo. Ongoing efforts are unknown to us. You had probably better ask the RISE multimedia group than FFmpeg-devel. I suppose you or one of your colleagues should have access. (I don't anyone here has.) > * >Available documentation or resources that could help us better understand the >existing codebase and optimization strategies. To be honest, in my experience, while it is obviously possible to optimise video decoding with RVV, the current implementations are not competitive (with e.g. Armv8 AdvSIMD) due most particularly to two aspects: 1) Segmented loads&stores are slow. Because video decoding often involves transposition, we would really need segmented unit-strided accesses to run as fast or almost as fast as single-segment unit-strided accesses of the same size. Likewise we need segmented register-strided accesses to be almost as fast as single-segment register strided accesses. 2) Because RVV is scalable, and video decoding uses a lot of fixed-size and/or small vectors, we need instruction execution cost to scale according to VL or next_power_of_two(VL). Currently it seems to scale according to VLMAX, which means larger vectors make optimisations worse rather than better. (This is based on benchmarks for your C910 and C908 cores, and SpacemiT's X60. I don't have access to any other hardware at the moment.) Point being, the available hardware seems a little bit immature, so we don't really have settled optimisations strategies. Br, _______________________________________________ ffmpeg-devel mailing list -- [email protected] To unsubscribe send an email to [email protected]
