Am 15.01.2012, 11:45 Uhr, schrieb Manu <[email protected]>:
On 15 January 2012 08:16, Sean Cavanaugh <[email protected]>
wrote:
On 1/15/2012 12:09 AM, Walter Bright wrote:
On 1/14/2012 9:58 PM, Sean Cavanaugh wrote:
MS has three types, __m128, __m128i and __m128d (float, int, double)
Six if you count AVX's 256 forms.
On 1/7/2012 6:54 PM, Peter Alexander wrote:
On 7/01/12 9:28 PM, Andrei Alexandrescu wrote:
I agree with Manu that we should just have a single type like __m128
in
MSVC. The other types and their conversions should be solvable in a
library with something like strong typedefs.
The trouble with MS's scheme, is given the following:
__m128i v;
v += 2;
Can't tell what to do. With D,
int4 v;
v += 2;
it's clear (add 2 to each of the 4 ints).
Working with their intrinsics in their raw form for real code is pure
insanity :) You need to wrap it all with a good math library (even if
90%
of the library is the intrinsics wrapped into __forceinlined
functions), so
you can start having sensible operator overloads, and so you can write
code
that is readable.
if (any4(a > b))
{
// do stuff
}
is way way way better than (pseudocode)
if (__movemask_ps(_mm_gt_ps(a, b)) == 0x0F)
{
}
and (if the ternary operator was overrideable in C++)
float4 foo = (a > b) ? c : d;
would be better than
float4 mask = _mm_gt_ps(a, b);
float4 foo = _mm_or_ps(_mm_and_ps(mask, c), _mm_nand_ps_(mask, d));
Yep, it's coming... baby steps :)
Walter: I told you games devs would be all over this! :P
And even a compression algorithms. I found one written in C, that uses
external .asm files to be compiled into object files with NASM for use on
the linker command line. They contain some MMX/SSE code depending on the
processor you plan to use. The author claims, that the MMX version of the
'outsourced' routines run 8x faster. I didn't verify this, but the idea
that these instructions become part of the language and easy to use for
regular programmers like me (and not just console game developers) is
exciting. I bet there are more programs that could benefit from SSE than
is obvious or code that could be rewritten in way, that multiple data sets
can be processed simultaneous.