> Isn't it also faster to max LMUL for the adds here? It requires the use of one more vset, making the time slightly longer: 147.7 (m1), 148.7 (m8 + vset).
Also this might not be much noticeable on C908, but avoiding sequential dependencies on the address registers may help. I mean, avoid using as address operand a value that was calculated by the immediate previous instruction. > Okay, but the test results haven't changed.. It would add more than ten lines of code, perhaps shorter code will better? Rémi Denis-Courmont <r...@remlab.net> 于2024年3月8日周五 02:55写道: > Le lauantaina 2. maaliskuuta 2024, 14.06.13 EET flow gg a écrit : > > Here adjusting the order, rather than simply using .rept, will be 13%-24% > > faster. > > Isn't it also faster to max LMUL for the adds here? > > Also this might not be much noticeable on C908, but avoiding sequential > dependencies on the address registers may help. I mean, avoid using as > address > operand a value that was calculated by the immediate previous instruction. > > -- > Rémi Denis-Courmont > http://www.remlab.net/ > > > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".