vf_bwdif: Add aarch64 neon functions

Martin Storsjö Wed, 05 Jul 2023 14:20:09 -0700

On Tue, 4 Jul 2023, John Cox wrote:

Also adds a filter_line3 method which on aarch64 neon yields approx 30%
speedup over 2xfilter_line and a memcpy


Differences from v3:
Remove a few lines of neon in filter_line that should have been removed
when copying from line3

Sorry about the two patch sets in quick succession, but I think I've
applied all the requested changes and I didn't want this mistake in the
final patchset. (The mistake was benign - it just wasted a few cycles.)

John Cox (7):
 tests/checkasm: Add test for vf_bwdif filter_intra
 avfilter/vf_bwdif: Add neon for filter_intra
 tests/checkasm: Add test for vf_bwdif filter_edge
 avfilter/vf_bwdif: Add neon for filter_edge
 avfilter/vf_bwdif: Add neon for filter_line
 avfilter/vf_bwdif: Add a filter_line3 method for optimisation
 avfilter/vf_bwdif: Add neon for filter_line3

I think this looks ok to me, so I'll go ahead and push it. The tests passon x86 too, msvc/aarch64, llvm-mingw/aarch64, macOS and linux.


Just a couple notes I didn't remember to mention before:

- Regarding the int parameters on the stack; as long as you do have the Cwrapper functions, you don't strictly need to have the same functionsignature for the NEON function as for the actual DSP function. So ifyou'd have wanted to have a different signature for the NEON function(changing it to intptr_t), that'd worked too. But I do see the benefit ofkeeping it identical to the DSP function interface.

- The way of making the the C function exported and calling that for thetail is neat, but kinda unusual within ffmpeg. In most cases (except forparts of swscale), we can just assume and rely on buffers being alignedenough for the SIMD vector length of the current platform, and freelyoverwrite a little into the padding at the end of the lines. Not sure ifthis is the case here though.

(If it is, it's easy enough to remove those bits and make the C functionsstatic again as a follow-up.)

Also, checkasm coverage for >8bpp would be nice as mentioned, but ifsomeone wants to write asm for that, it should be doable to factorize thenew tests to run them for both 8 and 16 bpp.


That said, it looks ok enough to me so I'll push it.

// Martin

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v4 0/7] avfilter/vf_bwdif: Add aarch64 neon functions

Reply via email to