Re: [Haskell-cafe] [RFC] benchmarks of bytestrings, teaser
On Sun, 2007-12-16 at 15:21 -0800, Don Stewart wrote: An updated bytestring library is at : http://hackage.haskell.org/cgi-bin/hackage-scripts/package/bytestring-0.9.0.2 Enjoy! :) Thanks! -Peter ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] [RFC] benchmarks of bytestrings, teaser
Don Stewart wrote: cnt:: B.ByteString - Int64 cnt bs = B.length (B.filter (== ' ') bs) [...] Now, this memory result is suspicious, I wonder if the now obsolete 'array fusion' is messing things up. In Data.ByteString.Lazy, we have: Are you sure you have a fusible length? I think I only added it to NDP after stream fusion went in. Roman ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] [RFC] benchmarks of bytestrings, teaser
rl: Don Stewart wrote: cnt:: B.ByteString - Int64 cnt bs = B.length (B.filter (== ' ') bs) [...] Now, this memory result is suspicious, I wonder if the now obsolete 'array fusion' is messing things up. In Data.ByteString.Lazy, we have: Are you sure you have a fusible length? I think I only added it to NDP after stream fusion went in. It was the array fusion from prior to the stream stuff (foldl . filter). Which was in fact messing up the simplifier. I've fixed this, (turned off array fusion for now), and things are back to normal. (well, much faster, actually). -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] [RFC] benchmarks of bytestrings, teaser
I've had a look at how some of the code was being compiled for strict and lazy bytestrings, and also which rules weren't firing. With some small tweaks the code seems back in good shape. An updated bytestring library is at : http://hackage.haskell.org/cgi-bin/hackage-scripts/package/bytestring-0.9.0.2 Enjoy! :) Summary: the suspicious lazy bytestring program works now. (constant space, and fastest overall, as expected originally) Program 1, lazy bytestring length . filter Yesterday: ./A +RTS -sstderr 150M 1.01s user 0.10s system 98% cpu 1.123 total 40M allocated * Today (fixed!): ./A +RTS -sstderr 150M 0.26s user 0.06s system 96% cpu 0.332 total 2M allocated Reason, deprecated array fusion mucking up the optimiser. I think we can close this regression. Also, I had a look at Program 3: lazy bytestring, custom loop Unchanged. 2.4s, constant space. This was a bit slow. Further investigation shows lots of unnecessary bounds checks, as we take apart the Chunk lazy bytestring type, then test and continue. This representation was chosen to make it possible to process chunks efficiently, so that we can avoid these bounds check. Something like this instead: cnt :: Int - B.ByteString - Int cnt n B.Empty= n cnt n (B.Chunk x xs) = cnt (n + cnt_strict 0 x) xs -- process lazy spine -- now we can process a chunk without checking for Empty where cnt_string !i !s-- then strict chunk | S.null s = i | c == ' ' = cnt_strict (i+1) t | otherwise = cnt_strict i t where (c,t) = (S.w2c (S.unsafeHead s), S.unsafeTail s) -- no bounds check main = do s - B.getContents; print (cnt 0 s) Let's us avoid redundant checks for Empty, while allowing 'go' to avoid unnecessary checks for the empty strict bytestring. This is some 4x faster. This alternating between lazy spines and strict chunk processing is the best way to get reliable performance from lazy bytestring custom loops. -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] [RFC] benchmarks of bytestrings, teaser
What do you think the relative speeds are of the six small haskell programs at the end of this email? All they do is read from stdin and count the number of spaces they see. There are two that use strict bytestrings, two that use lazy bytestrings, and two that use the standard Haskell strings. Three use a recursive function with an accumulator parameter and three use a foldl with a lambda function. Say the fastest one takes the time 1. How much time will the others take? And how about memory? How much memory do you think they require? Let's say we feed a 150MB(*) file into each of them, how many megabytes do you think they end up using (as seen from the OS, not in terms of how big the live heap is)? I'm going to post full benchmarks + analysis on Wednesday. -Peter *) hardddisk megabytes. The file is 15034 bytes ≈ 143 mebibytes. PS: For extra credit, what do you think is the peak memory use for this program when given an input file of 150MB? {-# LANGUAGE BangPatterns #-} import qualified Data.ByteString.Lazy.Char8 as B import GHC.Int (Int64) -- note that D.BS.Lazy.Char8.length is ByteString - Int64 -- D.BS.C8.length is ByteString - Int cnt :: B.ByteString - Int64 cnt bs = B.length (B.filter (== ' ') bs) main = do s - B.getContents print (cnt s) == hs/space-bs-c8-acc-1.hs: {-# LANGUAGE BangPatterns #-} import qualified Data.ByteString.Char8 as B cnt :: Int - B.ByteString - Int cnt !acc bs = if B.null bs then acc else cnt (if B.head bs == ' ' then acc+1 else acc) (B.tail bs) main = do s - B.getContents print (cnt 0 s) == hs/space-bslc8-acc-1.hs: {-# LANGUAGE BangPatterns #-} import qualified Data.ByteString.Lazy.Char8 as B cnt :: Int - B.ByteString - Int cnt !acc bs = if B.null bs then acc else cnt (if B.head bs == ' ' then acc+1 else acc) (B.tail bs) main = do s - B.getContents print (cnt 0 s) == hs/space-x-acc-1.hs: {-# LANGUAGE BangPatterns #-} cnt :: Int - String - Int cnt !acc bs = if null bs then acc else cnt (if head bs == ' ' then acc+1 else acc) (tail bs) main = do s - getContents print (cnt 0 s) == hs/space-bs-c8-foldlx-1.hs: {-# LANGUAGE BangPatterns #-} import qualified Data.ByteString.Char8 as B cnt :: B.ByteString - Int cnt bs = B.foldl' (\sum c - if c == ' ' then sum+1 else sum) 0 bs main = do s - B.getContents print (cnt s) == hs/space-bslc8-foldlx-1.hs: {-# LANGUAGE BangPatterns #-} import qualified Data.ByteString.Lazy.Char8 as B cnt :: B.ByteString - Int cnt bs = B.foldl' (\sum c - if c == ' ' then sum+1 else sum) 0 bs main = do s - B.getContents print (cnt s) == hs/space-x-foldl.hs: {-# LANGUAGE BangPatterns #-} cnt :: String - Int cnt bs = foldl (\sum c - if c == ' ' then sum+1 else sum) 0 bs main = do s - getContents print (cnt s) == ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] [RFC] benchmarks of bytestrings, teaser
On Sat, 2007-12-15 at 09:25 +0100, Peter Lund wrote: What do you think the relative speeds are of the six small haskell programs at the end of this email? Ok, I presume this is a guessing game and we're supposed to just look at the code without running and timing them. All they do is read from stdin and count the number of spaces they see. There are two that use strict bytestrings, two that use lazy bytestrings, and two that use the standard Haskell strings. Three use a recursive function with an accumulator parameter and three use a foldl with a lambda function. Say the fastest one takes the time 1. How much time will the others take? And how about memory? How much memory do you think they require? Let's say we feed a 150MB(*) file into each of them, how many megabytes do you think they end up using (as seen from the OS, not in terms of how big the live heap is)? I'm going to post full benchmarks + analysis on Wednesday. Right'o. I'll have a go. Lets see if I can't embarrass myself with being completely inaccurate. PS: For extra credit, what do you think is the peak memory use for this program when given an input file of 150MB? Ok. {-# LANGUAGE BangPatterns #-} import qualified Data.ByteString.Lazy.Char8 as B import GHC.Int (Int64) -- note that D.BS.Lazy.Char8.length is ByteString - Int64 -- D.BS.C8.length is ByteString - Int Yes, because strict bytestring cannot be bigger than the size of virtual memory and with ghc at least, Int tracks the size of the machine pointer. cnt :: B.ByteString - Int64 cnt bs= B.length (B.filter (== ' ') bs) main = do s - B.getContents print (cnt s) Hmm. So that should work in constant memory, a few 64 chunks at once. I'd expect this to be pretty fast. == hs/space-bs-c8-acc-1.hs: {-# LANGUAGE BangPatterns #-} import qualified Data.ByteString.Char8 as B cnt :: Int - B.ByteString - Int cnt !acc bs = if B.null bs then acc else cnt (if B.head bs == ' ' then acc+1 else acc) (B.tail bs) main = do s - B.getContents print (cnt 0 s) This uses strict bytestrings so will use at least 150Mb and that'll make it a good deal slower. In fact it'll be worse than that since getContents does not know in advance how big the input will be so it has to play the doubling and copying game. So it'll end up copying all the data roughly twice. cnt is strict and tail recursive so that shouldn't be any problem, though it's probably not as fast as the first length . filter since head, tail, null all have to do bounds checks. == hs/space-bslc8-acc-1.hs: {-# LANGUAGE BangPatterns #-} import qualified Data.ByteString.Lazy.Char8 as B cnt :: Int - B.ByteString - Int cnt !acc bs = if B.null bs then acc else cnt (if B.head bs == ' ' then acc+1 else acc) (B.tail bs) main = do s - B.getContents print (cnt 0 s) For the same reason as above, I'd expect this cnt to be slower than B.length . B.filter (== ' ') == hs/space-x-acc-1.hs: {-# LANGUAGE BangPatterns #-} cnt :: Int - String - Int cnt !acc bs = if null bs then acc else cnt (if head bs == ' ' then acc+1 else acc) (tail bs) main = do s - getContents print (cnt 0 s) Lazy, so constant memory use, but much higher constant factors due to using String. == hs/space-bs-c8-foldlx-1.hs: {-# LANGUAGE BangPatterns #-} import qualified Data.ByteString.Char8 as B cnt :: B.ByteString - Int cnt bs= B.foldl' (\sum c - if c == ' ' then sum+1 else sum) 0 bs main = do s - B.getContents print (cnt s) This is of course still strict so that's going to make the reading slow. This is a manually fused B.length . B.filter (== ' ') which hopefully is the same speed as the automatically fused one if the fusion is working ok. If not, then the B.length . B.filter (== ' ') will be doing a extra copy, and memory writes are expensive. == hs/space-bslc8-foldlx-1.hs: {-# LANGUAGE BangPatterns #-} import qualified Data.ByteString.Lazy.Char8 as B cnt :: B.ByteString - Int cnt bs= B.foldl' (\sum c - if c == ' ' then sum+1 else sum) 0 bs main = do s - B.getContents print (cnt s) As above but now in constant memory space. == hs/space-x-foldl.hs: {-# LANGUAGE BangPatterns #-} cnt :: String - Int cnt bs= foldl (\sum c - if c == ' ' then sum+1 else sum) 0 bs main = do s - getContents print (cnt s) Oh, no! not foldl that's a killer. Ok, so best way to summarise I think is to organise by data type since I think that'll dominate. So I think the lazy bytestring versions will be fastest due to having the best memory access patterns and doing the least copying. I think the foldl's will
Re: [Haskell-cafe] [RFC] benchmarks of bytestrings, teaser
On Sat, 2007-12-15 at 14:34 +, Duncan Coutts wrote: Ok, I presume this is a guessing game and we're supposed to just look at the code without running and timing them. Precisely :) All they do is read from stdin and count the number of spaces they see. There are two that use strict bytestrings, two that use lazy bytestrings, and two that use the standard Haskell strings. Three use a recursive function with an accumulator parameter and three use a foldl with a lambda function. Say the fastest one takes the time 1. How much time will the others take? And how about memory? How much memory do you think they require? Let's say we feed a 150MB(*) file into each of them, how many megabytes do you think they end up using (as seen from the OS, not in terms of how big the live heap is)? I'm going to post full benchmarks + analysis on Wednesday. Right'o. I'll have a go. Lets see if I can't embarrass myself with being completely inaccurate. Thanks for biting! You were, thankfully, only almost completely inaccurate ;) PS: For extra credit, what do you think is the peak memory use for this program when given an input file of 150MB? Hmm. So that should work in constant memory, a few 64 chunks at once. I'd expect this to be pretty fast. You are right about the speed. Can you guess a number in kilobytes? == hs/space-bs-c8-acc-1.hs: {-# LANGUAGE BangPatterns #-} import qualified Data.ByteString.Char8 as B cnt :: Int - B.ByteString - Int cnt !acc bs = if B.null bs then acc else cnt (if B.head bs == ' ' then acc+1 else acc) (B.tail bs) main = do s - B.getContents print (cnt 0 s) This uses strict bytestrings so will use at least 150Mb and that'll make it a good deal slower. In fact it'll be worse than that since getContents does not know in advance how big the input will be so it has to play the doubling and copying game. So it'll end up copying all the data roughly twice. cnt is strict and tail recursive so that shouldn't be any problem, though it's probably not as fast as the first length . filter since head, tail, null all have to do bounds checks. You are right about the memory. It is actually slightly faster than the extra credit (length/filter combination) above. == hs/space-bslc8-acc-1.hs: {-# LANGUAGE BangPatterns #-} import qualified Data.ByteString.Lazy.Char8 as B cnt :: Int - B.ByteString - Int cnt !acc bs = if B.null bs then acc else cnt (if B.head bs == ' ' then acc+1 else acc) (B.tail bs) main = do s - B.getContents print (cnt 0 s) For the same reason as above, I'd expect this cnt to be slower than B.length . B.filter (== ' ') It is slower but not for the same reason as above. == hs/space-x-acc-1.hs: {-# LANGUAGE BangPatterns #-} cnt :: Int - String - Int cnt !acc bs = if null bs then acc else cnt (if head bs == ' ' then acc+1 else acc) (tail bs) main = do s - getContents print (cnt 0 s) Lazy, so constant memory use, but much higher constant factors due to using String. Spot on. == hs/space-bs-c8-foldlx-1.hs: {-# LANGUAGE BangPatterns #-} import qualified Data.ByteString.Char8 as B cnt :: B.ByteString - Int cnt bs = B.foldl' (\sum c - if c == ' ' then sum+1 else sum) 0 bs main = do s - B.getContents print (cnt s) This is of course still strict so that's going to make the reading slow. Nope. This is a manually fused B.length . B.filter (== ' ') which hopefully is the same speed as the automatically fused one if the fusion is working ok. If not, then the B.length . B.filter (== ' ') will be doing a extra copy, and memory writes are expensive. == hs/space-bslc8-foldlx-1.hs: {-# LANGUAGE BangPatterns #-} import qualified Data.ByteString.Lazy.Char8 as B cnt :: B.ByteString - Int cnt bs = B.foldl' (\sum c - if c == ' ' then sum+1 else sum) 0 bs main = do s - B.getContents print (cnt s) As above but now in constant memory space. Nope. == hs/space-x-foldl.hs: {-# LANGUAGE BangPatterns #-} cnt :: String - Int cnt bs = foldl (\sum c - if c == ' ' then sum+1 else sum) 0 bs main = do s - getContents print (cnt s) Oh, no! not foldl that's a killer. You think it's worse than the program just above? Ok, so best way to summarise I think is to organise by data type since I think that'll dominate. So I think the lazy bytestring versions will be fastest due to having the best memory access patterns and doing the least copying. I think the foldl's will be faster than the explicit accumulators due to having fewer bounds checks. space-bslc8-foldlx-1
Re: [Haskell-cafe] [RFC] benchmarks of bytestrings, teaser
firefly: What do you think the relative speeds are of the six small haskell programs at the end of this email? All they do is read from stdin and count the number of spaces they see. There are two that use strict bytestrings, two that use lazy bytestrings, and two that use the standard Haskell strings. Three use a recursive function with an accumulator parameter and three use a foldl with a lambda function. Say the fastest one takes the time 1. How much time will the others take? And how about memory? How much memory do you think they require? Let's say we feed a 150MB(*) file into each of them, how many megabytes do you think they end up using (as seen from the OS, not in terms of how big the live heap is)? I'm going to post full benchmarks + analysis on Wednesday. How are you compiling these programs, by the way? ghc-6.8.2 -O2 ? (-O2 is required for bytestrings :) -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] [RFC] benchmarks of bytestrings, teaser
On Sat, 2007-12-15 at 11:59 -0800, Don Stewart wrote: firefly: What do you think the relative speeds are of the six small haskell programs at the end of this email? All they do is read from stdin and count the number of spaces they see. There are two that use strict bytestrings, two that use lazy bytestrings, and two that use the standard Haskell strings. Three use a recursive function with an accumulator parameter and three use a foldl with a lambda function. Say the fastest one takes the time 1. How much time will the others take? And how about memory? How much memory do you think they require? Let's say we feed a 150MB(*) file into each of them, how many megabytes do you think they end up using (as seen from the OS, not in terms of how big the live heap is)? I'm going to post full benchmarks + analysis on Wednesday. How are you compiling these programs, by the way? ghc-6.8.2 -O2 ? (-O2 is required for bytestrings :) With -O2. I have measured with 6.8.1, 6.9.20071119, 6.9.20071208 (approx), and 6.9.200712xx (as of yesterday or today). The picture changes very little with the compiler, if it changes at all. I have run them on three very different microarchitectures (2GHz Athlon64 3000+, 1667MHz Core Duo, 600MHz Pentium III). All the measurements are scripted. 'make phase1' compiles the benchmarks, creates input files of various sizes, and runs each benchmark once to gather information about memory use (peak RSS + ghc RTS' own information about allocations and gcs), page faults, and an strace. This phase is not timing sensitive so I can browse the web and listen to the music etc. while running it. 'make phase2' runs each benchmark a number of times, calculates the average time for each + the relative size of the standard deviation + how much user+sys is different from real (as reported by bash' built-in time command). A report with barcharts indicating relative time and relative peak RSS is generated in either pure ASCII or in UTF-8 (with fractional-width block chars so the charts look nice and have high resolution). If the measurements are deemed to be bad (too high standard deviation or user+sys doesn't add up to real) then the barchart is done with '%' characters. The quality indicators for each timing test are always printed out next to each bar, so we know how close we are to being perfect or, conversely, how bad the measurements are. There is a script that analyzes the I/O pattern and sums it up (as 4375 x read(3, 4096, ...) = 4096 followed by 1 x read(3, 4096, ...) = 1242 followed by 1 x read(3, 4096, ...) = 0 an similar). There are a set of simple I/O programs in C so we can compare ghc's performance with speed of light, and so different I/O strategies can be compared in a cleaner, purer form. There are also 'make cache' (runs cachegrind), 'make hs/.doc' (creates a file with source code + core + stg + c-- + assembly + times for a given benchmark), etc. 'make sysinfo' creates a file with information about the Linux distribution used (/etc/lsb-release), kernel version (uname -a), CPU (/proc/cpuinfo), and compilers used (ghc --version, gcc --version). 'make zipdata' creates a zip file of about 20K with all the raw time measurements + the sysinfo. I also have a set of scripts that installs each ghc version so 1) it is easy for me to repeat the tests on various machines and 2) you can see exactly which ghc versions I use and repeat the experiments yourself. -Peter ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] [RFC] benchmarks of bytestrings, teaser
firefly: What do you think the relative speeds are of the six small haskell programs at the end of this email? All they do is read from stdin and count the number of spaces they see. There are two that use strict bytestrings, two that use lazy bytestrings, and two that use the standard Haskell strings. Three use a recursive function with an accumulator parameter and three use a foldl with a lambda function. Say the fastest one takes the time 1. How much time will the others take? And how about memory? How much memory do you think they require? Let's say we feed a 150MB(*) file into each of them, how many megabytes do you think they end up using (as seen from the OS, not in terms of how big the live heap is)? I'm going to post full benchmarks + analysis on Wednesday. -Peter *) hardddisk megabytes. The file is 15034 bytes ≈ 143 mebibytes. PS: For extra credit, what do you think is the peak memory use for this program when given an input file of 150MB? Well, I'm not going to wait till Wednesday for the numbers! $ ghc --version The Glorious Glasgow Haskell Compilation System, version 6.8.2 $ du -hs 150M 150M Program 1: {-# LANGUAGE BangPatterns #-} import qualified Data.ByteString.Lazy.Char8 as B import GHC.Int (Int64) -- note that D.BS.Lazy.Char8.length is ByteString - Int64 -- D.BS.C8.length is ByteString - Int cnt:: B.ByteString - Int64 cnt bs = B.length (B.filter (== ' ') bs) main = do s - B.getContents print (cnt s) Ok, so lazy bytestrings. Should be constant space use, but two traversals (since there's no lazy bytestring fusion, over each lazy chunk). Not perfect, but should be ok. length will consume chunks as they're produced by filter. ** Prediction: Constant ~3M space use (runtime, plus small overhead) Fast, due to chunk-wise processing. ** Result: $ ghc -O2 A.hs -o A --make $ time ./A 150M +RTS -sstderr ./A +RTS -sstderr 150M 1.01s user 0.10s system 98% cpu 1.123 total And top says 40M allocated. ** Summary: Ok, pretty fast, but an unexpected(!) amount of memory allocated. Now, this memory result is suspicious, I wonder if the now obsolete 'array fusion' is messing things up. In Data.ByteString.Lazy, we have: filter p = F.loopArr . F.loopL (F.filterEFL p) F.NoAcc We keep meaning to replace this stuff with the stream fusion mechanisms, which compile a lot better. Perhaps the time has come to look at that :) I'll put this memory allocation down as a bug that needs to be looked at. Program 2: hs/space-bs-c8-acc-1.hs: {-# LANGUAGE BangPatterns #-} import qualified Data.ByteString.Char8 as B cnt:: Int - B.ByteString - Int cnt !acc bs = if B.null bs then acc else cnt (if B.head bs == ' ' then acc+1 else acc) (B.tail bs) main = do s - B.getContents print (cnt 0 s) Strict bytestrings, and you manually fuse the length/filter calculation. Allocating all that memory will cost you. ** Prediction: O(N) memory, around 150M allocated Slower, due to cache effects (more data to traverse) and more indirections. ** Result top says 154M ./B +RTS -sstderr 150M 1.10s user 0.52s system 111% cpu 1.454 total ** Summary: Seems reasonable, when its doing all that allocation. Program 2a: But we could easily improve this program: Since: length (filter (== ' ') == length (filterByte ' ' == count ' ' And we have: import qualified Data.ByteString.Char8 as B cnt :: B.ByteString - Int cnt x = B.count ' ' x main = do s - B.getContents print (cnt s) ** Prediction: Which should be a bit faster. ** Result: $ time ./B 150M 24569024 ./B 150M 0.66s user 0.55s system 113% cpu 1.070 total ** Summary: So that's the fastest program so far. The rewrite rules to do these transformatoins used to be enabled, but need looking at again. There should also be no real benefit to manually fuse a length . filter loop like this, however, the old fusion system used in bytestring might have some small overhead here. This also needs looking at. Program 2b: We can do even better if we read the file in: import qualified Data.ByteString.Char8as B import System.IO.Posix.MMap main = print . B.count ' ' = mmapFile 150M ** Prediction: super fast ** Result: $ time ./B2 24569024 ./B2 0.31s user 0.01s system 101% cpu 0.314 total (similar results if you use vanilla B.readFile too, fwiw). Summary: This suggests to me we could look again at how strings of unknown size are read in.
Re: [Haskell-cafe] [RFC] benchmarks of bytestrings, teaser
Am Sonntag, 16. Dezember 2007 04:07 schrieb Don Stewart: Program 7: == hs/space-x-foldl.hs: {-# LANGUAGE BangPatterns #-} cnt :: String - Int cnt bs= foldl (\sum c - if c == ' ' then sum+1 else sum) 0 bs main = do s - getContents print (cnt s) Hmm. Lazy accumulator eh, on String? Should exhibit a space leak. Doesn't (with -O2, at least), seems ghc's strictness analyser did a good job. It is indeed about 10* slower than ByteStrings, but very memory friendly - and, actually on my machine it's faster (not much) than Data.List.foldl' . And, again on my machine, Programme 3 is almost as slow when compiled with 6.8.1 and twice as slow when compiled with 6.6.1. Nice little benchmark. -- Don Cheers, Daniel ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] [RFC] benchmarks of bytestrings, teaser
On Sun, 2007-12-16 at 04:53 +0100, Daniel Fischer wrote: Hmm. Lazy accumulator eh, on String? Should exhibit a space leak. Doesn't (with -O2, at least), seems ghc's strictness analyser did a good job. It is indeed about 10* slower than ByteStrings, but very memory friendly - Daniel is right, there's no space leak. Try it. You'll get a nice surprise :) -Peter ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] [RFC] benchmarks of bytestrings, teaser
firefly: On Sun, 2007-12-16 at 04:53 +0100, Daniel Fischer wrote: Hmm. Lazy accumulator eh, on String? Should exhibit a space leak. Doesn't (with -O2, at least), seems ghc's strictness analyser did a good job. It is indeed about 10* slower than ByteStrings, but very memory friendly - Daniel is right, there's no space leak. Try it. You'll get a nice surprise :) Very nice. If we disable the strictness analsyer, $ ghc -O2 -fno-strictness A.hs -no-recomp -o A We get a core loop that looks like: lgo :: Int - [Char] - Int lgo n xs = case xs of [] - n a : as - lgo (case a of -- sad dons C# c1_amH - case c1_amH of { ' ' - case n of I# i - I# (i +# 1) _- n; ) as Look at that big lazy expression for 'n' not being forced! And when run: 1015M 1017M onproc/1 - 0:05 22.36% A Scary stuff. Lots of Int thunks :) But enabling the strictness analyser: lgo :: Int# - [Char] - Int# lgo (n :: Int#) (xs :: [Char]) = case xs of [] - n a : as - case a of C# c - case c of ' ' - lgo (n +# 1) as -- makes me happy _ - lgo nas And life is good again :) What is quite amazing is how efficient this program is. I had to rerun this a dozen or so times, since I didn't quite believe it: $ time ./A /usr/obj/data/150M 24569024 ./A /usr/obj/data/150M 2.42s user 0.47s system 100% cpu 2.883 total Pretty stunning, I think. Swapping in a slightly more eager structure, the lazy ByteString, $ time ./A /usr/obj/data/150M 24569024 ./A /usr/obj/data/150M 0.86s user 0.07s system 98% cpu 0.942 total improves things by a good amount, but I think we can revisit the low level performance of lazy bytestrings again, in light of all the changes to the optimiser in the past 2 years. -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] [RFC] benchmarks of bytestrings, teaser
On Sat, 2007-12-15 at 21:18 -0800, Don Stewart wrote: What is quite amazing is how efficient this program is. Yep. I was very surprised too. Turns out there *was* a reason to run those string benchmarks, eh? ;) improves things by a good amount, but I think we can revisit the low level performance of lazy bytestrings again, in light of all the changes to the optimiser in the past 2 years. Yep. -Peter ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] [RFC] benchmarks of bytestrings, teaser
On Sat, 2007-12-15 at 19:07 -0800, Don Stewart wrote: Well, I'm not going to wait till Wednesday for the numbers! But I got you to write down predictions before you ran the tests, which is precisely what I wanted :) Summary, * Program 1 is fast, as expected, but exhbits a bug in the bytestring library's lazy bytestring fusion system. Something in length or filter isn't doing the right job. This code will be replaced by the stream fusion system soon. Good. * Program 2: as expected. strict IO uses O(N) space, and that has performance effects. * Program 3: lazy bytestrings use constant space, but you better avoid redundant bounds checks in the inner loops. Maybe its extra bounds-checking that makes it slow, as you say. It probably is. I must admit that I couldn't follow the core/stg/C--/assembly code at all. * Program 4: strings are silly No they are not. They are the default data structure for text and give the baseline that bytestrings should beat. I find it interesting to see *if* bytestrings beat it and if so, by how much. The vanilla string versions of my tests all use less memory than any of the other versions, but they are a bit slower. And perhaps not as much slower as they should be... * Program 5: as expected. similar to program 2. * Program 6: strict foldl's over lazy bytestrings are good :) fast, and constant space. * Program 7: see program 4. Pretty much as expected then, but with a bug identified in lazy bytestring fusion (I think). Nice little benchmark. Thanks :) There are more... -Peter ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe