The following code shows what I mean   , the Intel /MS intrinsic functions
generate simple lines or blocks of code because they are not aware of the
surroundings.  In this case  moving the values to the  SSE register is very
expensive , not only does it use 3 registers but it also consumes a lot of
cycles.  There are many better options  ( eg the instruction to copy the
same value to  all 4 ints  or  to make long long values first and then
construct them)  but intrinsic do only basic substitutions  this is what im
talking about with the cost of converting between GP and SSE.  And it could
have been  4 scalars instead of loop values also  there is no opportunity
for automated optimization with the exception of zeroing it. 

 

for (int i = 0; i != bulk_count; i++) 

       {

 

              unsigned char* prt = ringBuffer ->getSendDataPtr(header);

              // write message header...

 

              const __m128i Reg0 = _mm_set_epi32 ( i,i,i,i); 

//movd xmm0, esi

              //movd xmm1, esi

              //movd xmm2, esi

              //movd xmm3, esi

              //mov  DWORD PTR _prt$8074[ebp], eax

              //punpckldq xmm2, xmm0

              //punpckldq xmm3, xmm1

              //punpckldq xmm3, xmm2

 

 

              _mm_stream_si128((__m128i *) prt ,Reg0 );

              //movntdq     XMMWORD PTR [eax], xmm3

 

              _mm_stream_si128((__m128i *) prt + 1 ,Reg0 );

              //movntdq     XMMWORD PTR [eax], xmm3

 

              _mm_stream_si128((__m128i *) prt + 2 ,Reg0 );

              //movntdq     XMMWORD PTR [eax], xmm3

              

 

// more code

       

              ringBuffer ->compSend(size); 

 

 

                                // more code 

 

}

 

 

Ben

 

From: [email protected] [mailto:[email protected]] On
Behalf Of Jonathan S. Shapiro
Sent: Sunday, August 15, 2010 3:53 AM
To: [email protected]; Discussions about the BitC language
Subject: Re: [bitc-dev] Bitc and Simd

 

On Sat, Aug 14, 2010 at 9:06 AM, Ben Kloosterman <[email protected]> wrote:

 >Part of the problem with

 >anything even partial auto-vectorisation is that the language
 >expression semantics are often specified in terms of promoting
 >subtypes to machine ints/machine uints, doing computations, then
 >converting back to the data type.


Yes this conversion is one of the biggest issues and intrinsic do very badly
at these since they have little awareness of surrounding code.


We may have a difference of understanding about what an intrinsic is. In my
view, an intrinsic is basically an instruction that has been taught to the
compiler.

Since BitC doesn't do implicit conversion, that shouldn't be an issue.



shap

 

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.851 / Virus Database: 271.1.1/3069 - Release Date: 08/14/10
02:34:00

_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to