#2374: MutableByteArray# is slower than Addr#
----------------------------------+-----------------------------------------
    Reporter:  dolio              |       Owner:                
        Type:  bug                |      Status:  new           
    Priority:  normal             |   Component:  Compiler (NCG)
     Version:  6.8.2              |    Severity:  minor         
    Keywords:  performance array  |    Testcase:                
Architecture:  x86_64 (amd64)     |          Os:  Linux         
----------------------------------+-----------------------------------------
 I've filed this against the native code generator, even thought it still
 occurs in my experience with the via-C path. It is significantly more
 pronounced with the NCG path, and so, I imagine, is likely to be fixed by
 working on that. However, feel free to reclassify if appropriate.

 When attempting to rewrite the fannkuch benchmark of the computer language
 benchmarks game to use more realistic code, I discovered that I am
 currently unable to do so. At least one piece of the puzzle is that the
 current entry uses Ptr and Addr# for arrays, while the current mutable
 array libraries (ST arrays and uvector) use MutableByteArray# underneath,
 and it happens that reads/writes on the latter are slower than the former.

 After some discussion on the glasgow-haskell-users list, I believe I've
 worked the kinks out of my (even more micro) benchmark programs, and will
 attach them here. They accept two numbers, k and n, and reverse an array
 of size-n k times. The larger fannkuch benchmark works on small arrays
 (reversing, shifting and copying), so using small arrays with a very large
 number of iterations is more representative of it. However, my benchmark
 doesn't show terribly much variation when one varies the parameters by
 corresponding factors (although there is a drop in performance when the
 size of the array nears the size of the CPU cache, naturally).

 I've test machine is an AMD Athlon 64 3000+ running in 64-bit mode with 2
 GB of ram. My operating system is Linux, kernel 2.6.24.

 I've mainly done runs where n * k = 2.5 billion, as that takes
 approximately as much time as the MutableByteArray# version of the
 fannkuch benchmark, around 11 seconds for 250 million iterations with an
 array size of 10. The Addr# version of the benchmark runs with the same
 parameters in around 7 seconds (which is faster than the actual fannkuch
 benchmark, but I digress).

 I'll attach my two micro benchmarks. ByteArr.hs is the MutableByteArray#
 reversal benchmark. Ptr.hs is the Addr#/Ptr version. I may also attach
 full fannkuch implementations as another data point (the original is
 already Addr#/Ptr, and I've written a MutableByteArray# implementation; I
 just need to do some proofreading of the latter to make sure it doesn't
 have any discrepancies).

 I've also not done much investigation as to the source of the problem,
 besides looking at the core of various higher level programs to track
 things down this far. I'll attempt to delve further into the issue,
 investigating the C-- and assembly if my abilities allow, and update here
 as I find things.

-- 
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/2374>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
Glasgow-haskell-bugs mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs

Reply via email to