A relevant trac ticket and wiki page can be found here: http://hackage.haskell.org/trac/ghc/ticket/3557 http://hackage.haskell.org/trac/ghc/wiki/VectorComputing
Vivian seems to now have partial patches for #3557. I haven't reviewed the discussion in the ticket or the patches in particular but it's good to see work here! In terms of using the FFI in practically any language, your typical "glue" to close to the metal languages is fairly simple: the C calling convention. In terms of utilizing LLVM bitcode, you could merely generate LLVM bitcode (which uses the vector type you linked to) by any number of means - the haskell bindings come to mind :) - and then use the LLVM toolchain to generate optimized assembly code for whatever fast primitives you need. You now have an object-file you can link into your resulting program which just has regular haskell code + FFI code (NB. cabal doesn't allow you to link in arbitrary object files I don't think.) LLVM by default uses the regular C calling convention for its functions so you shouldn't have to jump through any hoops. But, in the general case I don't think the regular FFI/library route is the way to go perhaps (or maybe I don't know exactly what you want.) The main reason I can think of is object-file boundaries. Presumably what we really want in the most general case is the ability to write haskell code and know that GHC will guarantee it uses SIMD machine instructions for it. We want to leave the register selection and instruction selection up to GHC in the final stages of code generation. We can certainly wrap lots of small SIMD primitives/micro-ops in shims of LLVM, but I wonder what the actual expense of doing that is. As far as I understand, GHC will at least have to spill/reload registers on the foreign call (since you're going to be FFI'ing out to a function, it'll need the C calling convention.) The C compiler/LLVM would compile the external object file you link in, and it contains your said primitives. GHC-generated code is going to exist in other object files, and I honestly don't think the linker will be able to see past the boundary and move the assembly code into the call site, but perhaps I am wrong. When using lots of micro-ops wrapped around the FFI, this (the register spilling + inability to inline) will probably add up, but whether or not it's enough to matter is up to debate and it requires evidence and benchmarking, so you need a basis to compare it on. I think it would be worth trying this sort of investigation to look at the overhead, because frankly, I could be wrong. Maybe FFI calls are incredibly cheap. I think the FFI + SIMD from the 'outside world' (C/LLVM) is a feasible operation if what you want to implement is a specific, fairly 'straight forward' operation that GHC can't generate great code for, where the net gains in speed are good (and thus outweigh the foreign calls.) In terms of general primitives we can build fast code on, with the guarantee it is backed by SIMD instructions in the code generator, I think building the functionality more directly into GHC as Vivian has done is the way to go. Another, more LLVM-specific way of overcoming these limitations, now that I think about it, is LLVM-based link time optimization. GHC already accepts raw LLVM bitcode files looking at the code for the GHC driver and will bring them into the final compilation/link scope. GHC's LLVM backend already directly outputs the LLVM bitcode format - so you could, feasibly, just create a hand-written LLVM bitcode file that sort of shims the LLVM vector type. This file can be directly 'linked into' another bitcode file by the LLVM toolchain - the file in question being the LLVM bitcode for your program, generated by GHC. This overcomes the object-boundary limitation, and will let the LLVM code generator/optimizer do more optimization. If the shims are small, I imagine they could be inlined and LLVM will take care of all the register selection - perhaps even eliminating any spills/loads. So that process would be similar to what you want: write FFI code calling out to an external function using the normal C calling convention. Write special LLVM bitcode file exposing the primitives you want to use from haskell. When GHC compiles your program, it generates LLVM bitcode. Link your primitive bitcode together with the generated code, and run the LLVM toolchain on it. This approach is much more general of course, far beyond SIMD primitives. We could apply LLVM-based LTO to the entire stack - and optimize the whole program basically at the end of the compilation stage (and it could be practically free, since LLVM will do this for us.) It would definitely alleviate some of these issues, and perhaps LLVM can generate really good code for the whole program. I'm not a hardcore GHC hacker by any stretch, I think that covers your question pretty well. I'd appreciate any corrections if I'm totally wrong somewhere, because this work interests me as well and I've thought about it a bit at this point. On Wed, Mar 30, 2011 at 4:43 PM, Louis Wasserman <[email protected]> wrote: > I'm curious: as I understand it, one of the purposes of the FFI is to allow > bindings to "close to the metal" languages. I'm curious what would be > involved in letting the FFI provide bindings to LLVM code in Haskell. > While I'm at it, I'll mention the most recent reason that this thought > occurred to me, which is the complaint in here that Haskell can't currently > get access to SIMDs, while LLVM does have that access, and I'm trying to > think about how I might go about gluing those together. > > Louis Wasserman > [email protected] > http://profiles.google.com/wasserman.louis > > _______________________________________________ > Glasgow-haskell-users mailing list > [email protected] > http://www.haskell.org/mailman/listinfo/glasgow-haskell-users > > -- Regards, Austin _______________________________________________ Glasgow-haskell-users mailing list [email protected] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
