Am Dienstag, 17. Juni 2008 20:35 schrieb Dan Doel: > On Tuesday 17 June 2008, Daniel Fischer wrote: > > I've experimented a bit and found that Ptr is faster for small arrays > > (only very slightly so if compiled with -fvia-C -optc-O3), but ByteArr > > performs much better for larger arrays > > ... > > The GC time for the Addr# version is frightening > > I had an entire e-mail written about what a bizarre and interesting result > you'd just found, but unfortunately, I then remembered exactly how the > array gets filled in the Ptr version. Namely: > > (Ptr a) <- newArray [0..n-1]
Ouch. I should've looked at both sources, that would have been obvious then :) > > Which, I assume does something terrible, like calling length to get the > size needed for the array, while also needing the values after array > creation to feed into it. For small arrays like the ones I'd been testing > with, this doesn't matter, because the work done on the list is negligible. > However, when you get to very large lists (100,000 elements and above, > apparently) this starts causing massive space leaks, which explains the > terrible GC behavior we were seeing. > > If you change the benchmark like so: > > bench :: Int -> Int -> IO () > bench k n = do (Ptr a) <- mallocArray n :: IO (Ptr Int) > fill a n > replicateM_ k (reverse a 0 (n-1)) > > fill :: Addr# -> Int -> IO () > fill a (I# n) = IO (go 0#) > where > go i s > > | i <# n = case writeIntOffAddr# a i i s of s -> go (i +# 1#) s > | otherwise = (# s, () #) > > The space leak goes away, and the runtimes stay consistent. Up to around > 10,000 elements, Ptr hovers around 6s, and ByteArray (-fasm) stays around > 11. At 100,000, Ptr jumps to 12-13s, and ByteArray goes to 14-16 and stays > there (again, I imagine due to running into bad cache effects at that > level). This is all for size * iterations = 2.5 billion on my machine. Since my computer is slower, I confined my tests to size*iterations ~= 10^9. Mostly, I find no noticeable difference, between 0.2 and 0.5 %, sometimes one faster, sometimes the other. I have the impression that the Ptr version is the faster more often than the ByteArr version, but the tendency isn't very strong. That applies to both compiled with -O2 -fvia-C -optc-O3. Compiling with -O2 -fasm doesn't make a noticeable difference for Ptr, but is about 13% slower for ByteArr when the arrays are large (too large for the cache, I suppose) and about 50% slower for small arrays. So what I can read off that is that the native code generator still has to catch up for such code, not that either way of implementing arrays is inherently faster. > > A good catch anyhow, though. That could explain why Simon Marlow was seeing > the Addr# version as slower, since he was using a large array, and thus > work done on the list could have contributed significantly (although MUT > time was higher with Ptr, so it would have had to contribute work there, > not just GC thrashing). > > Thanks for the input, > -- Dan Cheers, Daniel _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users