On Tue, 29 Mar 2022, Ben Avison wrote:

Thirdly - the added test also occasionally fails for the other existing functions (armv6, neon) and the newly added aarch64 neon version. If you have e.g. src[] = 32767, dst[] = 255, then the widening 8->16 addition will overflow, as there's no operation that both widens and clamps at the same time.

So it does. I obviously just didn't hit those cases in my test runs!

I can't easily test all codecs that use this function, but I just tried instrumenting the VC-1 case and it doesn't appear to actually use this particular function, so I'm none the wiser!

Should I just limit the 16-bit values to +/-0x100 and re-enable the armv4 fast path then?

Yes, I think that'd be the safest path forward. Worst case, the test would be slightly too narrow and could miss some valid case - but that's at least better than having the test give false positives for perfectly correct assembly, that would work just fine for actual decoder use.

// Martin

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to