On 8/01/12 1:32 AM, Manu wrote:
On 8 January 2012 02:54, Peter Alexander <[email protected]
<mailto:[email protected]>> wrote:
I agree with Manu that we should just have a single type like __m128
in MSVC. The other types and their conversions should be solvable in
a library with something like strong typedefs.
Walter put in a reasonable effort to sway me to his side of the fence
last night. I'm still not entirely sold that implementation inside the
language is necessary to achieve these details, but I don't have enough
background into to argue, and I'm not the one that has to maintain the
code :)
Here are some points we discussed... how do we do these (efficiently) in
a library?
Just to be clear, it was only the types and conversions that I thought
would be suitable for a library. Operations, along with their
optimisations are best for compiler.
** Literal syntax.. and constant folding:
Constants and literals also need to be aligned. If we use array syntax
to express literals, this will be a problem.
int4 v = [ 1,2,3,4 ] + [ 5,6,7,8 ];
Any constant expressions need to be simplified at compile time: int4 vec
= [ 6,8,10,12 ];
Perhaps this is possible with CTFE? Or will it be automatic if you
express literals as if they were arrays?
You could use array syntax for vector literals, as long as they are
stored directly into vector variables. e.g.
immutable int4 a = [1, 2, 3, 4];
immutable int4 b = [5, 6, 7, 8];
int4 v = a + b;
Constant folding can be done by compiler, although I don't think this is
a priority.
** Expression interpretation/simplification:
float4 v = -b + a;
Obviously, this should be simplified to 'a - b'.
float4 v = a*b + c;
This should use a multiply-accumulate opcode on most architectures:
FMADDPS v, a, b, c
Compiler should make these decisions, just like it does with int/float
etc. In some cases these kinds of simplifications can effect the result
due to numeric issues.
You can use expression templates for this sort of thing as well, but
they are a horrible mess, so I don't think I'd like to see them.
** Typed debug info
In a debugger it's nice to inspect variables in their supposed type.
Can probably use unions to do this... probably wouldn't be as nice though.
Good point. I'm not an expert on this, but I suspect that a union would
be good enough?
** God knows what other optimisations
float4 v = [ 0,0,0,0 ]; // XOR v
etc...
Again, I think you could use expression templates for this, but it's so
much simpler to leave this optimisation to the compiler.
Even if the compiler doesn't do it, it's not difficult to do it manually
when you really need it:
float4 v = void;
asm { pxor v, v; }
Honestly, I'm not too bothered with these types of optimisations. As
long as the compiler does the register allocation and instruction
scheduling for me, I would be 99% happy because those things are the
most tedious when trying to write structured code. I can easily enough
change (-b + a) to (b - a) if that's faster, or insert specific
instructions for generating vector constants, or do constant folding
manually.
Of course, it would be nice if the compiler did them, but that's just
icing on the cake. The meat of the problem is register allocation.
I don't know what amount of this is achievable with libraries, but
Walter seems to think this will all work much better in the language...
I'm inclined to trust his judgement.
I agree.