Iain Buclaw Wrote: > == Quote from Mike Farnsworth ([email protected])'s article > > I built gdc from tip on Fedora 13 (x86-64) and started playing around > > with creating a vector struct (x,y,z,w) to see what kind of optimization > > the code generator did with it. It was able to partially drop into SSE > > registers and instructions, but not as well as I had hoped from writing > > "regular" D code. > > I poked through the builtins that get pulled into d-builtins.c / > > d-builtins2.cc but I don't see anything that might be pulling in > > definitions such as __builtin_ia32_* for SSE, for example. > > How hard would it be to get some sort of vector attribute attached to a > > type (or just plain indroduce v4sf, __m128, or something like that) and > > get those SIMD builtins available? > > For the curious, here are how they are defined in, for example, > > xmmintrin.h for gcc: > > typedef float __m128 __attribute__ ((__vector_size__ (16), __may_alias__)); > > typedef float __v4sf __attribute__ ((__vector_size__ (16))); > > extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, > > __artificial__)) > > _mm_add_ps (__m128 __A, __m128 __B) > > { > > return (__m128) __builtin_ia32_addps ((__v4sf)__A, (__v4sf)__B); > > } > > Although GDC hashes out GCC builtins and attributes, most of it is very much > incomplete. For example, a D version (for GDC) of the code above would be > something like: > > > import gcc.builtins; > > pragma(set_attribute, __m128, vector_size(16), may_alias); > pragma(set_attribute, __v4sf, vector_size(16)); > pragma(set_attribute, _mm_add_ps, always_inline, artificial); > > typedef float __m128; > typedef float __v4sf; > > __m128 _mm_add_ps (__m128 __A, __m128 __B) > { > return cast(__m128) __builtin_ia32_addps (cast(__v4sf)__A, > cast(__v4sf)__B); > } > > > > However, this doesn't work because > > 1) There is no 128bit float type in DMDFE (can be put in though, even if it is > just for internal use). > 2) Vectors are not representable in DMDFE. > > So __builtin_ia32_addps (and many other ia32 builtins) cannot be emitted to > the D > environment.
I figured this would be the case; the "typedef float whatever __attribute((vector_size(16)))" stuff is already weird, so I don't expect dmdfe to do the right thing with even similar syntax at all. > Interestingly enough, this particular example actually ICEs the compiler. It > appears that while *explicit* casting is done in the code, DMDFE actually > *ignores* this, which is terrible on DMD's part... Hah. It's obvious dmdfe doesn't understand that the builtin's signature correctly, so I'll hold off on a bug report until I can figure out what kind of signature that builtin had registered with dmdfe. > Saying that, workaround is to use array types. > typedef float[4] __m128; > typedef float[4] __v4sf; > > > All the more reason to show you that pragma(attribute) is still very > incomplete to > use. Any ideas to improve it are welcome though. :) In my (not very abundant) spare time, I'll poke around the attribute stuff to see if I can attach the vector_size(16) attribute to a float[4] array type. I know the __builtin_ia32_addps function, for example, takes a v4sf (__m128 is just Intel's version that can change personalities at will; I feel no inclination to keep it around, and instead go with more strictly defined types and cast intrinsics). If I can get that builtin to take a typedef'd float[4] without a cast, perhaps dmdfe will not drop any data and the codegen will happen properly. Where do I look to see the attribute pragmas in gdc? Where do I look to potentially change the signature that dmdfe sees for the __builtin_ia32_* functions? If I can get a hand-coded signature to work, then we'll be in business. -Mike
