Hello, Following Henrik Grammer comments new patch in attach
replace int size by ptrdiff_t size I simplify the code, keeping only 1 loop (more easy to read, and doesn't have a real impact on speed) i use the SBUTTERFLY Macro for sse for avx2 i keep my previous approach Pass fate-exr tests for me (os X) Current benchmark AVX2 239920 decicycles in reorder_pixels_zip, 130958 runs, 114 skips bench: utime=101.367s SSE 283768 decicycles in reorder_pixels_zip, 130948 runs, 124 skips bench: utime=101.424s Scalar 3119101 decicycles in reorder_pixels_zip, 130429 runs, 643 skips bench: utime=114.414s The result of the suggested asm by Henrik AVX2 : 258602 decicycles in reorder_pixels_zip, 130853 runs, 219 skips SSE : 285167 decicycles in reorder_pixels_zip, 130863 runs, 209 skips In term of speed using -benchmark, the difference with the current patch is hard to see. Martin
0001-libavcodec-exr-add-X86-64-SIMD-for-reorder_pixels.patch
Description: Binary data
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel