On Fri, Nov 13, 2015 at 7:17 PM, Ronald S. Bultje <rsbul...@gmail.com> wrote: > Hi, > > On Fri, Nov 13, 2015 at 6:16 PM, Ganesh Ajjanagadde <gajja...@mit.edu> > wrote: > >> On Fri, Nov 13, 2015 at 4:52 PM, Ronald S. Bultje <rsbul...@gmail.com> >> wrote: >> > Hi, >> > >> > On Fri, Nov 13, 2015 at 4:28 PM, Ganesh Ajjanagadde <gajja...@mit.edu> >> > wrote: >> > >> >> On Fri, Nov 13, 2015 at 1:28 PM, Ronald S. Bultje <rsbul...@gmail.com> >> >> wrote: >> >> > Hi, >> >> > >> >> > On Fri, Nov 13, 2015 at 12:17 PM, Ganesh Ajjanagadde < >> gajja...@mit.edu> >> >> > wrote: >> >> > >> >> >> On Fri, Nov 13, 2015 at 12:10 PM, Ronald S. Bultje < >> rsbul...@gmail.com> >> >> >> wrote: >> >> >> > Hi Ganesh, >> >> >> > On Nov 13, 2015 12:02 PM, "Ganesh Ajjanagadde" < >> >> gajjanaga...@gmail.com> >> >> >> > wrote: >> >> >> >> >> >> >> >> The rationale for this function is reflected in the documentation >> for >> >> >> >> it, and is copied here: >> >> >> >> >> >> >> >> Clip a double value into the long long amin-amax range. >> >> >> >> This function is needed because conversion of floating point to >> >> integers >> >> >> > when >> >> >> >> it does not fit in the integer's representation does not >> necessarily >> >> >> > saturate >> >> >> >> correctly (usually converted to a cvttsd2si on x86) which >> saturates >> >> >> > numbers >> >> >> >> > INT64_MAX to INT64_MIN. The standard marks such conversions as >> >> >> undefined >> >> >> >> behavior, allowing this sort of mathematically bogus conversions. >> >> This >> >> >> > provides >> >> >> >> a safe alternative that is slower obviously but assures safety and >> >> >> better >> >> >> >> mathematical behavior. >> >> >> >> API: >> >> >> >> @param a value to clip >> >> >> >> @param amin minimum value of the clip range >> >> >> >> @param amax maximum value of the clip range >> >> >> >> @return clipped value >> >> >> >> >> >> >> >> Note that a priori if one can guarantee from the calling side that >> >> the >> >> >> >> double is in range, it is safe to simply do an explicit/implicit >> >> cast, >> >> >> >> and that will be far faster. However, otherwise this function >> should >> >> be >> >> >> >> used. >> >> >> >> >> >> >> >> avutil minor version is bumped. >> >> >> >> >> >> >> >> Reviewed-by: Ronald S. Bultje <rsbul...@gmail.com> >> >> >> >> Signed-off-by: Ganesh Ajjanagadde <gajjanaga...@gmail.com> >> >> >> >> --- >> >> >> >> libavutil/common.h | 30 ++++++++++++++++++++++++++++++ >> >> >> >> libavutil/version.h | 2 +- >> >> >> >> 2 files changed, 31 insertions(+), 1 deletion(-) >> >> >> >> >> >> >> >> diff --git a/libavutil/common.h b/libavutil/common.h >> >> >> >> index 6f0f582..f4687ab 100644 >> >> >> >> --- a/libavutil/common.h >> >> >> >> +++ b/libavutil/common.h >> >> >> >> @@ -298,6 +298,33 @@ static av_always_inline av_const double >> >> >> > av_clipd_c(double a, double amin, double >> >> >> >> else return a; >> >> >> >> } >> >> >> >> >> >> >> >> +/** >> >> >> >> + * Clip and convert a double value into the long long amin-amax >> >> range. >> >> >> >> + * This function is needed because conversion of floating point >> to >> >> >> > integers when >> >> >> >> + * it does not fit in the integer's representation does not >> >> necessarily >> >> >> > saturate >> >> >> >> + * correctly (usually converted to a cvttsd2si on x86) which >> >> saturates >> >> >> > numbers >> >> >> >> + * > INT64_MAX to INT64_MIN. The standard marks such conversions >> as >> >> >> > undefined >> >> >> >> + * behavior, allowing this sort of mathematically bogus >> conversions. >> >> >> > This provides >> >> >> >> + * a safe alternative that is slower obviously but assures safety >> >> and >> >> >> > better >> >> >> >> + * mathematical behavior. >> >> >> >> + * @param a value to clip >> >> >> >> + * @param amin minimum value of the clip range >> >> >> >> + * @param amax maximum value of the clip range >> >> >> >> + * @return clipped value >> >> >> >> + */ >> >> >> >> +static av_always_inline av_const int64_t av_rint64_clip_c(double >> a, >> >> >> > int64_t amin, int64_t amax) >> >> >> >> +{ >> >> >> >> +#if defined(HAVE_AV_CONFIG_H) && defined(ASSERT_LEVEL) && >> >> ASSERT_LEVEL >> >> >> >>= 2 >> >> >> >> + if (amin > amax) abort(); >> >> >> >> +#endif >> >> >> >> + // INT64_MAX+1,INT64_MIN are exactly representable as IEEE >> >> doubles >> >> >> >> + if (a >= 9223372036854775808.0 || llrint(a) >= amax) >> >> >> >> + return amax; >> >> >> >> + if (a <= -9223372036854775808.0 || llrint(a) <= amin) >> >> >> >> + return amin; >> >> >> > >> >> >> > Doesn't this allow negative overflows in the max check? I think you >> >> need >> >> >> > both overflow checks before the min/max checks. Try the next float >> val >> >> >> > smaller than int64_min as input with a small amax, eg 0. I bet it >> >> >> returns 0 >> >> >> > instead of amin. >> >> >> >> >> >> They are needed. As you and others can clearly see, I am quite bad >> >> >> with this stuff. >> >> > >> >> > >> >> > Hm, so, getting back to my computer, I wanted to test this, and I have >> >> this >> >> > problem: llrint() works correctly for me for the "undefined" cases, >> i.e., >> >> > it already does what you're trying to fix in av_rint64_clip_c. >> >> > >> >> > llrint(-10223372056756029440.000000) returns -9223372036854775808 >> >> > llrint(10223372056756029440.000000) returns 9223372036854775807 >> >> > >> >> > So, how do I reproduce that llrint() overflows? >> >> >> >> The link I gave originally >> >> >> http://blog.frama-c.com/index.php?post/2013/10/09/Overflow-float-integer >> >> gives an illustration. Maybe the weird behavior happens only on >> >> 9223372036854775808.0. This happens because INT64_MAX+1 is not >> >> representable in long long, and hence signed overflow occurs yielding >> >> INT64_MIN (of course undefined). Here is a minimal test program: >> >> #include <stdio.h> >> >> #include <math.h> >> >> >> >> int main(void) { >> >> printf("%lld\n", llrint(9223372036854775808.0)); >> >> return 0; >> >> } >> >> >> > >> > 9223372036854775807 >> > >> > Seems apple's libc got one thing right :) >> >> I personally am not that charitable, looking more carefully at your >> asm shows a cmplesd, suggesting slowdown. Here is a source reference: >> https://opensource.apple.com/source/Libm/Libm-2026/Source/ARM/llrint.c. >> As usual, Apple dumps many implementations of llrint and it is unclear >> which is actually being used on OS X at the moment (see e.g other >> https://opensource.apple.com/source/Libm/Libm-92/i386.subproj/llrint.c), >> but I digress. >> >> They essentially all put special case code like the patch above. Thus >> their function is inherently slower than the conformant GNU libm >> implementation. A client may very well want a branch free llrint for >> speed. Apple offers no performance choice here, forcing a fast llrint >> to use cvt2dsi inline or equivalent. Don't know if FFmpeg is affected >> by this slowdown. > > > I think FFmpeg should consider using Apple's version as a x86 > implementation for av_rint64_clip :)
I don't agree with this: it is a far less readable implementation with many more lines of code, and worse yet only handles the llrint aspect and not the clipping. Regardless, belongs to a separate patch/thread. Pushed. Thanks all for reviews. > > Ronald > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel