#3557: SIMD operations in GHC.Prim
---------------------------------+------------------------------------------
Reporter: guest | Owner: vivian
Type: feature request | Status: new
Priority: normal | Milestone: _|_
Component: Compiler (NCG) | Version: 6.11
Keywords: | Testcase:
Blockedby: | Difficulty: Unknown
Os: Unknown/Multiple | Blocking:
Architecture: Unknown/Multiple | Failure: None/Unknown
---------------------------------+------------------------------------------
Changes (by vivian):
* cc: haskell.vivian.mcph...@… (added)
* owner: => vivian
Comment:
Okay, this might take a while. First set of questions:
I'm thinking of making a module `GHC.Prim.SSE` that contains the new ops.
1) SSE instructions are CPU specific, so we need a way to check whether
the CPU supports the various SSE extensions (SSE, SSE2, SSE3, SSE4,...).
(assembler instruction CPUID).
a) If an extension is '''not''' supported, then does the primop not get
defined, or do we hand code a definition for the primop? How do we
propagate this information to user code? We could have functions
{{{
sse :: Bool
sse2 :: Bool
sse3 :: Bool
...
}}}
b) This also affects cross-compilation, where checking the CPU of the
build machine doesn't tell us about the capabilities of the target
machine.
c) do we include memory management primOps? Specifically there are
opcodes for bypassing the cache, which is helpful for live, streaming
data.
2) One of the instructions is a dot product instruction that takes
(A0,A1,A2,A3), (B0,B1,B2,B3) as packed 32-bit floats and returns
(A0*B0)+(A1*B1)+(A2*B2)+(A3*B3). This would work really well with a
streaming data type, the first pass (for a vector of 32-bit floats)
computes 4-piece chunks of dot product and the next pass computes the sum
of those results.
a) I seem to recall that `Data.Vector.Unboxed` is faster than
`Data.Vector.Storable`. My initial thought about packing/unpacking 4
32-bit types into a 128-bit unit would be to peek/poke. Do unboxed tuples
have any guarantees about alignment and so on in memory? It would be
great to have a function like
{{{
packFloat :: Float# -> Float# -> Float# -> Float# -> Xmm128#
packFloat a b c d = (# a, b, c, d #)
}}}
The real win is when we have contiguous sequences of well-aligned floats
in memory, so we can fetch 128-bit chunks at a time, bypassing the cache
if necessary.
b) Also, what is the relationship between boxed/unboxed numbers? is
`Float# -> Float` a no-op? The 'unboxed' vectors in Data.Vector.Unboxed
appear to not be types like {{{Int#, Int8#}}} but rather {{{Int, Int8}}}.
So, my plan is to start with adding primOps to GHC.Prim and the compiler
and then follow the code through the compiler to Cmm and the code
generators, making changes as necessary.
--
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/3557#comment:5>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
Glasgow-haskell-bugs mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs