On Tue, 29 Mar 2022, Ben Avison wrote:
Thirdly - the added test also occasionally fails for the other existing
functions (armv6, neon) and the newly added aarch64 neon version. If you
have e.g. src[] = 32767, dst[] = 255, then the widening 8->16 addition
will overflow, as there's no operation that both widens and clamps at
the same time.
So it does. I obviously just didn't hit those cases in my test runs!
I can't easily test all codecs that use this function, but I just tried
instrumenting the VC-1 case and it doesn't appear to actually use this
particular function, so I'm none the wiser!
Should I just limit the 16-bit values to +/-0x100 and re-enable the
armv4 fast path then?
Yes, I think that'd be the safest path forward. Worst case, the test would
be slightly too narrow and could miss some valid case - but that's at
least better than having the test give false positives for perfectly
correct assembly, that would work just fine for actual decoder use.
// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".