Hi Roman,

On Tue, Jan 25, 2011 at 6:35 PM, Roman Leshchinskiy <[email protected]> 
wrote:
> Daniel Peebles wrote:
>>
>> Johan Tibell and I have been working on some new primops for GHC that
>> would allow people to use fast optimized memcpys on unpinned memory.
>
> Why not just use memcpy?

I'm working with unpinned memory so memcpy is not an option as the GC
might move the memory during the foreign call. I also want to avoid to
avoid filling the array twice (first with a initial element I don't
care about, followed by the elements of another array).

There's some research on /derived pointers/ (e.g. in HotSpot) that
would allow traversals of arbitrary memory using pointer arithmetic
and still let the GC move objects around. We don't have anything like
that in GHC at the moment.

I don't want to use pinned memory as the arrays I'm dealing with are
pretty small (e.g. 32 elements). Even though they're small using a
copyArray# primop is faster than a loop using indexArray#/writeArray#.

In addition, I'd like to avoid unnecessarily filling the arrays with a
default element just to overwrite them right after. Speaking of array
initialization, I'm not sure the CMM compiler is up to optimizing the
initialization loop well either. My understanding is that the CMM
compiler is not as strong as e.g. GCC when it comes to optimizing
loops.

>> copyByteArray#        ::        ByteArray#   -> Int# -> MutableByteArray#
>> s -> Int# -> Int# -> State# s -> State# s
>> copyMutableByteArray# :: MutableByteArray# s -> Int# -> MutableByteArray#
>> s -> Int# -> Int# -> State# s -> State# s
>
> These are just instances of memcpy. FWIW, very similar operations are
> provided by the primitive package.

The GC issue applies here too. There are real users of unpinned
ByteArray#s, e.g. Text.

>> cloneArray#        ::        Array# s a -> State# s -> (# State s,
>> MutableArray# s a #)
>> cloneMutableArray# :: MutableArray# s a -> State# s -> (# State s,
>> MutableArray# s a #)
>
> It would be nice if those could be used on array slices. Maybe this:
>
> cloneArray# :: Array# a -> Int# -> Int# -> State# s -> Array# a
> cloneMutableArray# :: MutableArray# s a -> Int# -> Int# -> State# s -> (#
> State# s, MutableArray# s a #)
> freezeArray# :: MutableArray# s a -> Int# -> Int# -> State# s -> (# State#
> s, Array# a #)
> thawArray# :: Array# a -> Int# -> Int# -> State# s -> (# State# s,
> MutableArray# s a #)

Supporting slicing would be nice. I'd prefer if we had e.g.

cloneArray# :: Array# a -> Int# -> Int# -> State# s -> (# State# s
MutableArray# s a #)

as you might won't to mutate the clone array before freezing it. The
use case I really want to support is updating a single array element,
which would entail first cloning the array, the writing an element,
and finally freezing the array.

> Note that freezeArray# and thawArray# would be safe, i.e., would always
> copy.

That sounds awfully expensive!

> Perhaps a block write would also be useful:
>
> fillMutableArray# :: MutableArray# s a -> Int# -> Int# -> a -> State# s ->
> (# State# s, () #)

That could be useful.

>> cloneByteArray#        ::        ByteArray#   -> State# s -> (# State s,
>> MutableByteArray# s #)
>> cloneMutableByteArray# :: MutableByteArray# s -> State# s -> (# State s,
>> MutableByteArray# s #)
>
> Aren't these just newMutableByteArray# followed by copy? Why do you want
> them to be primops? There shouldn't be any speed advantage.

You're right (assuming that we don't zero the memory on allocation).

> IIRC, memmove just calls memcpy if the ranges don't overlap so always
> calling memmove and having it do the check should be just as fast.

That would be a nice simplification. We should double check that this
is indeed what happens in e.g. libc.

Cheers,
Johan

_______________________________________________
Cvs-ghc mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/cvs-ghc

Reply via email to