Hi, Johan Tibell and I have been working on some new primops for GHC that would allow people to use fast optimized memcpys on unpinned memory. Some preliminary benchmarks have shown that on my Mac, a memcpy-based array copy is about 4x faster than a Haskell-based loop doing the same thing. The use cases I can think of would be in the vector package (possibly the New stuff in there, but also just as primitives in the classes) and Johan's current hash trie, that (as far as I can tell) is a "flatter" IntMap that uses a small (32-element) Array# at each node for wider branching (with a hash operation to allow for arbitrary keys).
The other issue I encountered while implementing these copy primops was the default element initialization in newArray#. It uses a cmm loop to iterate over all the array elements and set them to the default. It then uses another loop to set the cards to 0 (which could probably be replaced by memset, but that's another question). So basically, Johan's common operation is to clone arrays in his nodes during an update. This involves a call to newArray#, which initializes each element to a default in a loop, and then another loop to copy elements over from the existing array. Initial benchmarks of Johan's code show that a new copyArray# primitive makes it quite a bit faster, but even more convenient would be a cloneArray# primop that allocates a new array and uses memcpy to copy the existing elements over, without initializing first. This would remove two manual loops from the common case in the trie code, and replace them with a fast memcpy. What we're proposing is to add the following 8 primops: copyArray# :: Array# a -> Int# -> MutableArray# s a -> Int# -> Int# -> State# s -> State# s copyMutableArray# :: MutableArray# a -> Int# -> MutableArray# s a -> Int# -> Int# -> State# s -> State# s copyByteArray# :: ByteArray# -> Int# -> MutableByteArray# s -> Int# -> Int# -> State# s -> State# s copyMutableByteArray# :: MutableByteArray# s -> Int# -> MutableByteArray# s -> Int# -> Int# -> State# s -> State# s cloneArray# :: Array# s a -> State# s -> (# State s, MutableArray# s a #) cloneMutableArray# :: MutableArray# s a -> State# s -> (# State s, MutableArray# s a #) cloneByteArray# :: ByteArray# -> State# s -> (# State s, MutableByteArray# s #) cloneMutableByteArray# :: MutableByteArray# s -> State# s -> (# State s, MutableByteArray# s #) The MutableArray# versions would have to care about the card tables, and the ByteArray# versions wouldn't. The Mutable -> Mutable versions would check if the input arrays are the same and use memmove in that case, to allow fast shuffling around within the same array. Clones would be the merged allocation + copy operation I mentioned. I realize this is quite a few primops to propose, but the motivation for separating the mutable from the immutable copies and clones was that freeze and thaw both have side effects, so you can't get a mutable/immutable "view" of an array without changing it. I've implemented some of these and they're all fairly mechanical except for the code making sure that the cards are treated correctly for the MutableArray# versions. Does anyone have any comments on these primops or suggestions for an alternative set? Thanks, Daniel
_______________________________________________ Cvs-ghc mailing list [email protected] http://www.haskell.org/mailman/listinfo/cvs-ghc
