On Fri, 20 Feb 2009 08:55:16 +0300, Denis Koroskin <[email protected]>
wrote:
On Fri, 20 Feb 2009 06:22:40 +0300, Andrei Alexandrescu
<[email protected]> wrote:
Denis Koroskin wrote:
On Thu, 19 Feb 2009 23:05:34 +0300, Andrei Alexandrescu
<[email protected]> wrote:
Denis Koroskin wrote:
On Thu, 19 Feb 2009 22:25:04 +0300, Mattias Holm
<[email protected]> wrote:
Since (SIMD) vectors are so common and every reasonabe system
support them in one way or the other (and scalar emulation of this
is rather simple), why not have support for this in D directly?
Yes, the array operations are nice (and one of the main reasons for
why I like D :) ), but have the problem that an array of floats
must be aligned on float boundaries and not vector boundaries. In
my mind vectors are a primitive data type that should be exposed by
the programming language.
Something OpenCL-like:
float4 vec;
vec.xyzw = {1.0,1.0, 1.0, 1.0}; // assignment
vec.xyzw = vec.wyxz; // permutation
vec[i] = 1.0; // indexing
And then we can easily immagine some extra nice features to have
with respect to operators:
vec ^ vec2; // 3d cross product for float vectors, for int
vectors xor
Has this been discussed before?
/ Mattias
I don't see any reason why float4 can't be made a library type.
Yah, I was thinking the same:
struct float4
{
__align(16) float[4] data; // right syntax and value?
alias data this;
}
This looks like something that should go into std.matrix pronto. It
even has value semantics even though fixed arrays don't :o/.
Andrei
That would be great. If float4 gets its way into D, I'll share our
blazing fast math code with community (most common operations on
vectors, matrices, quaternions etc). It is written entirely in SSE
(intrinsics, not asm; there is a problem with inlining asm in D, IIRC.
Can anyone elaborate on this?) and *very* fast. According to our
benchmarks, that's the best we get squeeze out of hardware.
I know LLVM have support for *very* wide range of intrinsics:
http://www.cs.ucla.edu/classes/spring08/cs259/llvm-2.2/include/llvm/Intrinsics.gen
Hopefully they will get into LDC (and DMD *hint* Walter *hint*) very
soon.
Put me down for that. What do I need to do?
Andrei
Convince Walter to add float4 type and some intrinsics to DMD (I'll post
a list of those we use later), LDC will follow, I believe.
There should be some type that would be treated specially. After all,
intrinsics have function signatures and those should specify some
concrete types.
Here is a nice documentation about MMX, SSE, SSE2 intrinsics:
http://msdn.microsoft.com/en-us/library/y0dh78ez(VS.80).aspx
Here is a quick statistics on what intrinsics are used in our code and how
many times.
Note that it doesn't directly maps to how many times it is *actually* used
in user-code.
This info may give Walter some information about priorities (those
intrinsics that aren't often used may be given lower priority, for
example).
Arithmetic Operations (Floating-Point SSE2 Intrinsics)
http://msdn.microsoft.com/en-us/library/708ya3be(VS.80).aspx
_mm_add_ss - 2
_mm_add_ps - 48
_mm_sub_ss - 4
_mm_sub_ps - 24
_mm_mul_ss - 2
_mm_mul_ps - 100
_mm_div_ss - 0
_mm_div_ps - 1
_mm_sqrt_ss - 0
_mm_sqrt_ps - 0
_mm_rcp_ss - 1
_mm_rcp_ps - 0
_mm_rsqrt_ss - 0
_mm_rsqrt_ps - 1
_mm_min_ss - 0
_mm_min_ps - 1
_mm_max_ss - 0
_mm_max_ps - 1
Store Operations (SSE)
http://msdn.microsoft.com/en-us/library/ybhzf6dk(VS.80).aspx
_mm_store_ss - 1
_mm_store1_ps - 0
_mm_store_ps1 - 0
_mm_store_ps - 0
_mm_storeu_ps - 0
_mm_storer_ps - 0
_mm_move_ss - 2
Set Operations (SSE)
http://msdn.microsoft.com/en-us/library/wbzwdy6a(VS.80).aspx
_mm_set_ss - 0
_mm_set1_ps - 0
_mm_set_ps1 - 19
_mm_set_ps - 45
_mm_setr_ps - 0
_mm_setzero_ps - 2
Logical Operations (SSE)
http://msdn.microsoft.com/en-us/library/9759as73(VS.80).aspx
_mm_and_ps - 2
_mm_andnot_ps - 0
_mm_or_ps - 0
_mm_xor_ps - 3
Miscellaneous Instructions That Use Streaming SIMD Extensions
http://msdn.microsoft.com/en-us/library/dzs626wx.aspx
_mm_shuffle_ps - 124
_mm_shuffle_pi16 - 0
_mm_unpackhi_ps - 0
_mm_unpacklo_ps - 0
_mm_loadh_pi - 0
_mm_storeh_pi - 0
_mm_movehl_ps - 0
_mm_movelh_ps - 0
_mm_loadl_pi - 0
_mm_storel_pi - 0
_mm_movemask_ps - 0
_mm_getcsr - 0
_mm_setcsr - 0
_mm_extract_si64 - 0
_mm_extracti_si64 - 0
_mm_insert_si64 - 0
_mm_inserti_si64 - 0
Comparison Intrinsics (SSE)
http://msdn.microsoft.com/en-us/library/w8kez9sf(VS.80).aspx
Not used
Conversion Operations (SSE)
http://msdn.microsoft.com/en-us/library/0d4dtzhb(VS.80).aspx
Not used
Macros
_MM_SHUFFLE - 100 - #define _MM_SHUFFLE(fp3,fp2,fp1,fp0) (((fp3) << 6) |
((fp2) << 4) | ((fp1) << 2) | ((fp0)))