Hi,

Johan Tibell and I have been working on some new primops for GHC that would
allow people to use fast optimized memcpys on unpinned memory. Some
preliminary benchmarks have shown that on my Mac, a memcpy-based array copy
is about 4x faster than a Haskell-based loop doing the same thing. The use
cases I can think of would be in the vector package (possibly the New stuff
in there, but also just as primitives in the classes) and Johan's current
hash trie, that (as far as I can tell) is a "flatter" IntMap that uses a
small (32-element) Array# at each node for wider branching (with a hash
operation to allow for arbitrary keys).

The other issue I encountered while implementing these copy primops was the
default element initialization in newArray#. It uses a cmm loop to iterate
over all the array elements and set them to the default. It then uses
another loop to set the cards to 0 (which could probably be replaced by
memset, but that's another question).

So basically, Johan's common operation is to clone arrays in his nodes
during an update. This involves a call to newArray#, which initializes each
element to a default in a loop, and then another loop to copy elements over
from the existing array. Initial benchmarks of Johan's code show that a new
copyArray# primitive makes it quite a bit faster, but even more convenient
would be a cloneArray# primop that allocates a new array and uses memcpy to
copy the existing elements over, without initializing first. This would
remove two manual loops from the common case in the trie code, and replace
them with a fast memcpy.

What we're proposing is to add the following 8 primops:

copyArray#        ::        Array# a -> Int# -> MutableArray# s a -> Int# ->
Int# -> State# s -> State# s
copyMutableArray# :: MutableArray# a -> Int# -> MutableArray# s a -> Int# ->
Int# -> State# s -> State# s

copyByteArray#        ::        ByteArray#   -> Int# -> MutableByteArray# s
-> Int# -> Int# -> State# s -> State# s
copyMutableByteArray# :: MutableByteArray# s -> Int# -> MutableByteArray# s
-> Int# -> Int# -> State# s -> State# s


cloneArray#        ::        Array# s a -> State# s -> (# State s,
MutableArray# s a #)
cloneMutableArray# :: MutableArray# s a -> State# s -> (# State s,
MutableArray# s a #)

cloneByteArray#        ::        ByteArray#   -> State# s -> (# State s,
MutableByteArray# s #)
cloneMutableByteArray# :: MutableByteArray# s -> State# s -> (# State s,
MutableByteArray# s #)

The MutableArray# versions would have to care about the card tables, and the
ByteArray# versions wouldn't. The Mutable -> Mutable versions would check if
the input arrays are the same and use memmove in that case, to allow fast
shuffling around within the same array. Clones would be the merged
allocation + copy operation I mentioned.

I realize this is quite a few primops to propose, but the motivation for
separating the mutable from the immutable copies and clones was that freeze
and thaw both have side effects, so you can't get a mutable/immutable "view"
of an array without changing it.


I've implemented some of these and they're all fairly mechanical except for
the code making sure that the cards are treated correctly for the
MutableArray# versions.

Does anyone have any comments on these primops or suggestions for an
alternative set?

Thanks,
Daniel
_______________________________________________
Cvs-ghc mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/cvs-ghc

Reply via email to