On Fri, Dec 11, 2015 at 4:14 PM, Andreas Cadhalpun <andreas.cadhal...@googlemail.com> wrote: > On 07.12.2015 00:27, Ganesh Ajjanagadde wrote: >> On Sun, Dec 6, 2015 at 6:12 PM, Andreas Cadhalpun >> <andreas.cadhal...@googlemail.com> wrote: >>> On 06.12.2015 22:48, Michael Niedermayer wrote: >>>> my concern is more on h264 (CAVLC) and hevc speed >>> >>> I tested with 444_8bit_cavlc.h264 added 100 together with the concat >>> demuxer, >>> and couldn't see a measurable speed difference caused by this error check. >> >> Ok, so here is my understanding of the situation. >> I think both of you are right, but have different perspectives on this. >> So in practice a log < 7 may be usually predicted correctly, and the >> compiler in all likelihood will continue to inline the function. Thus, >> excepting the first run, there should not be an issue, and maybe the >> compiler even feeds in the "likely" information for the first run >> also. >> >> Nevertheless, I also understand Michael's perspective: h264 is >> arguably one of the most important codecs as supplied by FFmpeg. Even >> a 0.01% speedloss in some place should be done with extreme caution, >> since over time these may accumulate to something more sizable, say >> 0.5%. There should be a very good justification for it, and thus >> Michael spends effort trying to ensure that the security issue is >> fixed at identical asm. > > I wouldn't call this a security issue, it's just undefined behavior.
Meant really from a theoretical perspective, since undefined means anything can happen. Of course, in practice a distinction may be drawn. But then again, I consider even these worthy of backport. > >> I personally agree with Michael here due to h264's importance (note: >> this is modulo Michael's fix being correct, which I can't comment >> upon). > > Michael's proposed patch is more of a 'disable the warning' than > 'fix the warning': It is still not good if the condition > happens, but no warning is emitted anymore. > >> I would have thought differently 6 months back, but with more >> work with FFmpeg, I have shifted my views. >> To understand this better, see e.g commit how b7fb7c45 by me: I got >> rid of undefined behavior so as to not trigger ubsan stuff, and >> substituted with implementation defined behavior at zero cost that >> should be consistent across platforms in practice. Yes, one could >> insert a branch, but there is minimal utility: anyone feeding such >> extreme values is fuzzing or placing an irregular file in any case. > > In contrast to the change of b7fb7c45, this undefined behavior is not > only triggered by fuzzed samples, but even by a FATE sample. > Silencing has the disadvantage that is not noticeable anymore, if > a correct sample triggers this condition. Ok, I just wanted this studied and explored carefully, that is all. > >> Also, to understand Michael's views better, see e.g >> https://lists.ffmpeg.org/pipermail/ffmpeg-devel/2015-March/170611.html. >> Even superficially "cosmetic" effects can have a cost. See also commit >> by me: 68e79b27a5ed7. Even small things can matter. >> >> Really these things should be done with START/STOP timers since the >> change is in a tight inner construct. > > OK, so I did test with START/STOP timers in get_ue_golomb, one for the > first branch (A) and one for the second (B). > > For h264 I used again 444_8bit_cavlc.h264, but 1000 times concat'ed > together. > > With the check in the B branch: > 976 decicycles in get_ue_golomb B, 2047 runs, 1 skips > 747 decicycles in get_ue_golomb B, 1024 runs, 0 skips > 337 decicycles in get_ue_golomb A,16777054 runs, 162 skips > > Without the check: > 922 decicycles in get_ue_golomb B, 2046 runs, 2 skips > 2309 decicycles in get_ue_golomb B, 1022 runs, 2 skips > 341 decicycles in get_ue_golomb A,16777036 runs, 180 skips > > (I shortened this as the low runs are anyway not significant. > Also don't get confused by the fact that the B run count apparently > decreases, it's just that get_ue_golomb gets inlined in several > places, so there are several counters. However, the A counter > with the second most runs has factor 1000 less runs, so is basically > negligible.) > > This shows multiple things: > * The B branch is not executed often enough to get reproducible timings. > * The A branch is executed about 5000 times more often than the B branch. > (That's what I meant with less likely branch.) > > I also checked the cavs decoder, using the 100 times concat'ed cavs.mpg. > > With the check in the B branch: > 629 decicycles in get_ue_golomb B, 4194260 runs, 44 skips > 433 decicycles in get_ue_golomb A,268434102 runs, 1354 skips > > Without the check: > 624 decicycles in get_ue_golomb B, 4194273 runs, 31 skips > 433 decicycles in get_ue_golomb A,268434203 runs, 1253 skips > > This shows: > * The B branch gets about 0.8% slower with the check. > * The A branch is executed about 70 times more often than the B branch. > * So on average, get_ue_golomb gets 0.8%/70 = 0.01% slower for cavs. > * The B branch is already about 50% slower than the A branch. > > Extrapolating this to the h264 decoder, it would get slower by > about 0.8%/5000=0.00016%, which is negligible. > And that's only the runtime of get_ue_golomb, but the decoder > also does other things than calling this function. > > Thus I still think that adding the error check here has practically > no speed cost and is thus the better alternative. If it saves effort, maybe the level of detail of the analysis can be reduced somewhat for the future :). Thanks for your effort. > > Best regards, > Andreas > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel