On Sun, Aug 24, 2014 at 08:46:30AM +0000, Christophe Gisquet wrote: > In some cases, 2 or 3 calls are performed to functions for unusual > widths. Instead, perform 2 calls for different widths to split the > workload. > > The 8+16 and 4+8 widths for respectively 8 and more than 8 bits can't > be processed that way without modifications: some calls use unaligned > buffers, and having branches to handle this was resulting in no > micro-benchmark benefit. > > For block_w == 12 (around 1% of the pixels of the sequence): > Before: > 12758 decicycles in epel_uni, 4093 runs, 3 skips > 19389 decicycles in qpel_uni, 8187 runs, 5 skips > 22699 decicycles in epel_bi, 32743 runs, 25 skips > 34736 decicycles in qpel_bi, 32733 runs, 35 skips > > After: > 11929 decicycles in epel_uni, 4096 runs, 0 skips > 18131 decicycles in qpel_uni, 8184 runs, 8 skips > 20065 decicycles in epel_bi, 32750 runs, 18 skips > 31458 decicycles in qpel_bi, 32753 runs, 15 skips > --- > libavcodec/x86/hevcdsp_init.c | 43 > +++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 41 insertions(+), 2 deletions(-)
applied thanks [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Complexity theory is the science of finding the exact solution to an approximation. Benchmarking OTOH is finding an approximation of the exact
signature.asc
Description: Digital signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel