Hi I've just done a fair bit of work on hevc_cabac decode for the Rasberry Pi2 and I think that the patch is generally applicable. Patch is attached but you may prefer to take it from git:
https://github.com/jc-kynesim/rpi-ffmpeg.git branch: test/ff_hevc_cabac_3 commit: 423e160e639d301feb2b4ba220199d112def0164 On the Pi2 playing a 10Mbit 1080p H.265 clip (A bit of the Hobbit) it reduces the time in ff_hevc_hls_residual_coding (until transform) from ~26Gcycles to ~18Gcycles and it almost halves the time spent in the "core" bit of the function (from decoding the greater1 bits to the end of decode). This was measured using the CPU cycle counter. Tests done at Rasberry Pi suggests that on their ffmpeg branch it reduces overall CPU loading by ~10% whislt playing H.265. I haven't profiled it on any other platform - but I would expect useful improvements on most streams on most platforms. I have not yet run fate over it as I haven't yet finished downloading the samples (the internet connection here isn't wildly fast), but I have run it against the H265.1 conformance streams on both x86 and ARM and it causes no regressions. Known unknowns / possible issues: 1) I haven't tested it on anything with 64-bit ints (I don't have an appropriate m/c) - whilst I've coded in a manner that should hopefully be OK there I can see that there might be issues. 2) Only tested on gcc 4.8 and later (5.1 & 5.3). I've used an anonymous union to avoid changing other cabac code - I could believe this was a no-no and I'll have to change that. 3) Uses clz which doesn't seem to exist in the ffmpeg int libs (though ctz does) I'll happily accept suggestions as to what is considered better practice for these points. Regards John Cox
0001-H.265-residual-decode-performance-improvements.patch
Description: Binary data
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel