Iain Buclaw Wrote: > == Quote from Jerry Quinn ([email protected])'s article > > Iain Buclaw Wrote: > > > == Quote from Mike Farnsworth ([email protected])'s article > > > > I built gdc from tip on Fedora 13 (x86-64) and started playing around > > > > with creating a vector struct (x,y,z,w) to see what kind of optimization > > > > the code generator did with it. It was able to partially drop into SSE > > > > registers and instructions, but not as well as I had hoped from writing > > > > "regular" D code. > > > > I poked through the builtins that get pulled into d-builtins.c / > > > > d-builtins2.cc but I don't see anything that might be pulling in > > > > definitions such as __builtin_ia32_* for SSE, for example. > > > > How hard would it be to get some sort of vector attribute attached to a > > > > type (or just plain indroduce v4sf, __m128, or something like that) and > > > > get those SIMD builtins available? > > > > > > Saying that, workaround is to use array types. > > > typedef float[4] __m128; > > > typedef float[4] __v4sf; > > > > > > > > > All the more reason to show you that pragma(attribute) is still very > > > incomplete to > > > use. Any ideas to improve it are welcome though. :) > > The workaround actually looks like a cleaner way to define types for vector > intrinsics. How hard would it be to export vector intrinsics so the API > expects > float[4], for example? > > I haven't given it much thought on how internal representation could be, but > I'd > lean on using unions in D code for usage in the language. As its probably most > portable. > > For example, one of the older 'hello vectors' I know of: > > import std.c.stdio; > > pragma(set_attribute, __v4sf, vector_size(16)); > typedef float __v4sf; > > union f4vector > { > __v4sf v; > float[4] f; > } > > int main() > { > f4vector a, b, c; > > a.f = [1, 2, 3, 4]; > b.f = [5, 6, 7, 8]; > > c.v = a.v + b.v; > printf("%f, %f, %f, %f\n", c.f[0], c.f[1], c.f[2], c.f[3]); > > return 0; > } > > > Compile: gdc -c -g -msse hellovector.d > Dump Object: objdump -dS hellovector.o' > > And the output of the SIMD operation speaks for itself: > > c.v = a.v + b.v; > xorps %xmm1,%xmm1 > movlps %gs:0x0,%xmm1 > movhps %gs:0x8,%xmm1 > xorps %xmm0,%xmm0 > movlps %gs:0x0,%xmm0 > movhps %gs:0x8,%xmm0 > addps %xmm1,%xmm0 > movlps %xmm0,%gs:0x0 > movhps %xmm0,%gs:0x8 > > > Regards. > Iain
Huh, that's actually pretty promising. Hooray for gcc's vector ops. =) I suppose I should still try to beat up on the __builtin_ia32_* stuff to make sure that can work, but if the codegen already gets us that far then that's pretty good. With a little -O3 it might even clean up some of the extraneous stuff, especially with a sequence of vector operations. The intrinsics on will get us some of the more interesting things like movemasks, shuffles, vector compares, etc. As long as the union doesn't cause a bunch of load/store deadweight in the generated code, this might work nicely. However, I'll bet dmdfe doesn't undertand that __v4sf isn't really just a float, though...so at some point that will need to be fixed so that there is not accidental slicing and invalid array/structure sizes, etc. -Mike
