Hello,

Following Henrik Grammer comments
new patch in attach

replace int size by ptrdiff_t size

I simplify the code, keeping only 1 loop (more easy to read, and doesn't
have a real impact on speed)
i use the SBUTTERFLY Macro for sse
for avx2 i keep my previous approach

Pass fate-exr tests for me (os X)

Current benchmark
AVX2
239920 decicycles in reorder_pixels_zip,  130958 runs,    114 skips
bench: utime=101.367s

SSE
283768 decicycles in reorder_pixels_zip,  130948 runs,    124 skips
bench: utime=101.424s

Scalar
3119101 decicycles in reorder_pixels_zip,  130429 runs,    643 skips
bench: utime=114.414s


The result of the suggested asm by Henrik
AVX2 :
258602 decicycles in reorder_pixels_zip,  130853 runs,    219 skips

SSE :
285167 decicycles in reorder_pixels_zip,  130863 runs,    209 skips

In term of speed using -benchmark, the difference with the current patch is
hard to see.


Martin

Attachment: 0001-libavcodec-exr-add-X86-64-SIMD-for-reorder_pixels.patch
Description: Binary data

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Reply via email to