On Fri, 20 Feb 2009 08:55:16 +0300, Denis Koroskin <[email protected]> wrote:

On Fri, 20 Feb 2009 06:22:40 +0300, Andrei Alexandrescu <[email protected]> wrote:

Denis Koroskin wrote:
On Thu, 19 Feb 2009 23:05:34 +0300, Andrei Alexandrescu <[email protected]> wrote:

Denis Koroskin wrote:
On Thu, 19 Feb 2009 22:25:04 +0300, Mattias Holm <[email protected]> wrote:

Since (SIMD) vectors are so common and every reasonabe system support them in one way or the other (and scalar emulation of this is rather simple), why not have support for this in D directly?

Yes, the array operations are nice (and one of the main reasons for why I like D :) ), but have the problem that an array of floats must be aligned on float boundaries and not vector boundaries. In my mind vectors are a primitive data type that should be exposed by the programming language.

Something OpenCL-like:

    float4 vec;
    vec.xyzw = {1.0,1.0, 1.0, 1.0}; // assignment
    vec.xyzw = vec.wyxz; // permutation
    vec[i] = 1.0; // indexing

And then we can easily immagine some extra nice features to have with respect to operators:

vec ^ vec2; // 3d cross product for float vectors, for int vectors xor

Has this been discussed before?

/ Mattias

 I don't see any reason why float4 can't be made a library type.

Yah, I was thinking the same:

struct float4
{
     __align(16) float[4] data; // right syntax and value?
     alias data this;
}

This looks like something that should go into std.matrix pronto. It even has value semantics even though fixed arrays don't :o/.


Andrei
That would be great. If float4 gets its way into D, I'll share our blazing fast math code with community (most common operations on vectors, matrices, quaternions etc). It is written entirely in SSE (intrinsics, not asm; there is a problem with inlining asm in D, IIRC. Can anyone elaborate on this?) and *very* fast. According to our benchmarks, that's the best we get squeeze out of hardware.
 I know LLVM have support for *very* wide range of intrinsics:
http://www.cs.ucla.edu/classes/spring08/cs259/llvm-2.2/include/llvm/Intrinsics.gen Hopefully they will get into LDC (and DMD *hint* Walter *hint*) very soon.


Put me down for that. What do I need to do?

Andrei

Convince Walter to add float4 type and some intrinsics to DMD (I'll post a list of those we use later), LDC will follow, I believe. There should be some type that would be treated specially. After all, intrinsics have function signatures and those should specify some concrete types.


Here is a nice documentation about MMX, SSE, SSE2 intrinsics:
http://msdn.microsoft.com/en-us/library/y0dh78ez(VS.80).aspx

Here is a quick statistics on what intrinsics are used in our code and how many times. Note that it doesn't directly maps to how many times it is *actually* used in user-code.

This info may give Walter some information about priorities (those intrinsics that aren't often used may be given lower priority, for example).

Arithmetic Operations (Floating-Point SSE2 Intrinsics)
http://msdn.microsoft.com/en-us/library/708ya3be(VS.80).aspx
_mm_add_ss - 2
_mm_add_ps - 48
_mm_sub_ss - 4
_mm_sub_ps - 24
_mm_mul_ss - 2
_mm_mul_ps - 100
_mm_div_ss - 0
_mm_div_ps - 1
_mm_sqrt_ss - 0
_mm_sqrt_ps - 0
_mm_rcp_ss - 1
_mm_rcp_ps - 0
_mm_rsqrt_ss - 0
_mm_rsqrt_ps - 1
_mm_min_ss - 0
_mm_min_ps - 1
_mm_max_ss - 0
_mm_max_ps - 1

Store Operations (SSE)
http://msdn.microsoft.com/en-us/library/ybhzf6dk(VS.80).aspx
_mm_store_ss - 1
_mm_store1_ps - 0
_mm_store_ps1 - 0
_mm_store_ps - 0
_mm_storeu_ps - 0
_mm_storer_ps - 0
_mm_move_ss - 2

Set Operations (SSE)
http://msdn.microsoft.com/en-us/library/wbzwdy6a(VS.80).aspx
_mm_set_ss - 0
_mm_set1_ps - 0
_mm_set_ps1 - 19
_mm_set_ps - 45
_mm_setr_ps - 0
_mm_setzero_ps - 2

Logical Operations (SSE)
http://msdn.microsoft.com/en-us/library/9759as73(VS.80).aspx
_mm_and_ps - 2
_mm_andnot_ps - 0
_mm_or_ps - 0
_mm_xor_ps - 3

Miscellaneous Instructions That Use Streaming SIMD Extensions
http://msdn.microsoft.com/en-us/library/dzs626wx.aspx
_mm_shuffle_ps - 124
_mm_shuffle_pi16 - 0
_mm_unpackhi_ps - 0
_mm_unpacklo_ps - 0
_mm_loadh_pi - 0
_mm_storeh_pi - 0
_mm_movehl_ps - 0
_mm_movelh_ps - 0
_mm_loadl_pi - 0
_mm_storel_pi - 0
_mm_movemask_ps - 0
_mm_getcsr - 0
_mm_setcsr - 0
_mm_extract_si64 - 0
_mm_extracti_si64 - 0
_mm_insert_si64 - 0
_mm_inserti_si64 - 0

Comparison Intrinsics (SSE)
http://msdn.microsoft.com/en-us/library/w8kez9sf(VS.80).aspx
Not used

Conversion Operations (SSE)
http://msdn.microsoft.com/en-us/library/0d4dtzhb(VS.80).aspx
Not used

Macros
_MM_SHUFFLE - 100 - #define _MM_SHUFFLE(fp3,fp2,fp1,fp0) (((fp3) << 6) | ((fp2) << 4) | ((fp1) << 2) | ((fp0)))

Reply via email to