Vic,
On 6/10/2011 4:16 AM, Martin Fleisz wrote:
I am not quite sure how internally those _mm_* functions work, but if
those are really functions, it will definitely hurt the performance. I
think use assembly SSE2 instruction set directly (like paddw) should be
much better.
Vic
The _mm_* functions are compiler intrinsics and map 1:1 to the
corresponding SSE instructions. It's just a nicer and cleaner interface
to the instruction set (and there is no function call overhead).
-Martin
Martin beat me to it...
The _mm_* function _do_ indeed get compiled down to SSE assembly
instructions.
Here is what the function compiles down too:
rfx_decode_YCbCr_to_RGB_SSE2():
b0: 55 push %ebp
b1: 89 e5 mov %esp,%ebp
b3: 8b 45 08 mov 0x8(%ebp),%eax
b6: 8b 4d 0c mov 0xc(%ebp),%ecx
b9: 8b 55 10 mov 0x10(%ebp),%edx
bc: 53 push %ebx
bd: 8d 98 00 20 00 00 lea 0x2000(%eax),%ebx
c3: 90 nop
c4: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
c8: 66 0f 6f 1d 70 00 00 movdqa 0x70,%xmm3
cf: 00
cc: R_386_32 .rodata.cst16
d0: 66 0f fd 18 paddw (%eax),%xmm3
d4: 66 0f 6f 12 movdqa (%edx),%xmm2
d8: 66 0f 6f e2 movdqa %xmm2,%xmm4
dc: 66 0f 71 e4 02 psraw $0x2,%xmm4
e1: 66 0f 6f f2 movdqa %xmm2,%xmm6
e5: 66 0f 71 e6 03 psraw $0x3,%xmm6
ea: 66 0f 6f ea movdqa %xmm2,%xmm5
ee: 66 0f 71 e5 05 psraw $0x5,%xmm5
f3: 66 0f 6f cb movdqa %xmm3,%xmm1
f7: 66 0f fd ca paddw %xmm2,%xmm1
fb: 66 0f 6f 01 movdqa (%ecx),%xmm0
ff: 66 0f fd cc paddw %xmm4,%xmm1
103: 66 0f ef e4 pxor %xmm4,%xmm4
107: 66 0f fd ce paddw %xmm6,%xmm1
10b: 66 0f fd cd paddw %xmm5,%xmm1
10f: 66 0f ee cc pmaxsw %xmm4,%xmm1
113: 66 0f 6f e0 movdqa %xmm0,%xmm4
117: 66 0f 71 e4 02 psraw $0x2,%xmm4
11c: 66 0f ea 0d 60 00 00 pminsw 0x60,%xmm1
123: 00
120: R_386_32 .rodata.cst16
124: 66 0f e7 08 movntdq %xmm1,(%eax)
128: 66 0f 6f c8 movdqa %xmm0,%xmm1
12c: 66 0f 6f c3 movdqa %xmm3,%xmm0
130: 66 0f 6f f9 movdqa %xmm1,%xmm7
134: 66 0f f9 c4 psubw %xmm4,%xmm0
138: 66 0f 71 e7 04 psraw $0x4,%xmm7
13d: 66 0f f9 c7 psubw %xmm7,%xmm0
141: 66 0f 6f f9 movdqa %xmm1,%xmm7
145: 66 0f 71 e7 05 psraw $0x5,%xmm7
14a: 66 0f f9 c7 psubw %xmm7,%xmm0
14e: 66 0f 6f fa movdqa %xmm2,%xmm7
152: 66 0f 71 e7 01 psraw $0x1,%xmm7
157: 66 0f f9 c7 psubw %xmm7,%xmm0
15b: 66 0f 71 e2 04 psraw $0x4,%xmm2
160: 66 0f f9 c6 psubw %xmm6,%xmm0
164: 83 c0 10 add $0x10,%eax
167: 66 0f f9 c2 psubw %xmm2,%xmm0
16b: 66 0f ef d2 pxor %xmm2,%xmm2
16f: 66 0f f9 c5 psubw %xmm5,%xmm0
173: 66 0f ee c2 pmaxsw %xmm2,%xmm0
177: 66 0f 6f d1 movdqa %xmm1,%xmm2
17b: 66 0f 71 e2 01 psraw $0x1,%xmm2
180: 66 0f ea 05 60 00 00 pminsw 0x60,%xmm0
187: 00
184: R_386_32 .rodata.cst16
188: 66 0f e7 01 movntdq %xmm0,(%ecx)
18c: 66 0f 6f c3 movdqa %xmm3,%xmm0
190: 66 0f fd c1 paddw %xmm1,%xmm0
194: 66 0f fd c2 paddw %xmm2,%xmm0
198: 66 0f 71 e1 06 psraw $0x6,%xmm1
19d: 66 0f fd c4 paddw %xmm4,%xmm0
1a1: 66 0f ef e4 pxor %xmm4,%xmm4
1a5: 83 c1 10 add $0x10,%ecx
1a8: 66 0f fd c1 paddw %xmm1,%xmm0
1ac: 66 0f ee c4 pmaxsw %xmm4,%xmm0
1b0: 66 0f ea 05 60 00 00 pminsw 0x60,%xmm0
1b7: 00
1b4: R_386_32 .rodata.cst16
1b8: 66 0f e7 02 movntdq %xmm0,(%edx)
1bc: 83 c2 10 add $0x10,%edx
1bf: 39 d8 cmp %ebx,%eax
1c1: 0f 85 01 ff ff ff jne c8
<rfx_decode_YCbCr_to_RGB_SSE2+0x18>
1c7: 5b pop %ebx
1c8: 5d pop %ebp
1c9: c3 ret
Thanks,
Steve
------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
Freerdp-devel mailing list
Freerdp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freerdp-devel