#2374: MutableByteArray# is slower than Addr#
----------------------------------+-----------------------------------------
Reporter: dolio | Owner:
Type: bug | Status: new
Priority: normal | Component: Compiler (NCG)
Version: 6.8.2 | Severity: minor
Keywords: performance array | Testcase:
Architecture: x86_64 (amd64) | Os: Linux
----------------------------------+-----------------------------------------
I've filed this against the native code generator, even thought it still
occurs in my experience with the via-C path. It is significantly more
pronounced with the NCG path, and so, I imagine, is likely to be fixed by
working on that. However, feel free to reclassify if appropriate.
When attempting to rewrite the fannkuch benchmark of the computer language
benchmarks game to use more realistic code, I discovered that I am
currently unable to do so. At least one piece of the puzzle is that the
current entry uses Ptr and Addr# for arrays, while the current mutable
array libraries (ST arrays and uvector) use MutableByteArray# underneath,
and it happens that reads/writes on the latter are slower than the former.
After some discussion on the glasgow-haskell-users list, I believe I've
worked the kinks out of my (even more micro) benchmark programs, and will
attach them here. They accept two numbers, k and n, and reverse an array
of size-n k times. The larger fannkuch benchmark works on small arrays
(reversing, shifting and copying), so using small arrays with a very large
number of iterations is more representative of it. However, my benchmark
doesn't show terribly much variation when one varies the parameters by
corresponding factors (although there is a drop in performance when the
size of the array nears the size of the CPU cache, naturally).
I've test machine is an AMD Athlon 64 3000+ running in 64-bit mode with 2
GB of ram. My operating system is Linux, kernel 2.6.24.
I've mainly done runs where n * k = 2.5 billion, as that takes
approximately as much time as the MutableByteArray# version of the
fannkuch benchmark, around 11 seconds for 250 million iterations with an
array size of 10. The Addr# version of the benchmark runs with the same
parameters in around 7 seconds (which is faster than the actual fannkuch
benchmark, but I digress).
I'll attach my two micro benchmarks. ByteArr.hs is the MutableByteArray#
reversal benchmark. Ptr.hs is the Addr#/Ptr version. I may also attach
full fannkuch implementations as another data point (the original is
already Addr#/Ptr, and I've written a MutableByteArray# implementation; I
just need to do some proofreading of the latter to make sure it doesn't
have any discrepancies).
I've also not done much investigation as to the source of the problem,
besides looking at the core of various higher level programs to track
things down this far. I'll attempt to delve further into the issue,
investigating the C-- and assembly if my abilities allow, and update here
as I find things.
--
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/2374>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
Glasgow-haskell-bugs mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs