Thanks for taking a look at this. kirby81: > 2008/5/17 Andrew Coppin <[EMAIL PROTECTED]>: > > Hi folks. > > > > OK, try this: > > > > darcs get http://darcs.orphi.me.uk/MyMD5 > > cd MyMD5 > > ghc -O2 --make md5sum > > md5sum <some large filename> > > I've got some time to take a look at the code. It's very nice, > readable and declarative, but obviously not optimized for "raw speed". > There're a few things to do to have a speed-up of 2x, without going > "low-level". > > 1) There's a lazyness leak in Pad.hs. The sum in: > else make_block_bs bs0 : work (n + block_size_bytes) bs1 > is not strict. With a very large file (e.g. 100 mb) it means stack overflow. > > [roxas:~/Desktop/test2/MyMD5] kirby% ghc -O2 --make md5sum.hs > > [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip > Stack space overflow: current size 8388608 bytes. > Use `+RTS -Ksize' to increase it. > > To solve it just add a bang (!) before the n parameter: > > work !n bs = > > You've got to add {-# LANGUAGE BangPatterns #-} at the top of the file > too. There're solutions that don't imply BangPatterns, and use only > seq, but I like bang patterns! > > Now it works: > [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip > E6542028D538BAEC180A96A5D1E6EC3A > 11.958u 0.210s 0:12.19 99.7% 0+0k 0+2io 0pf+0w
Good idea. I saw some lazy left folds in there too, which look suspicious. > > 2) make_block_bs is sub-optimal, and very critical to performance. I > decided to use Data.Binary for it (it's also more readable, and you > get rid of the unsafeWrite): > import Data.Binary > import Data.Binary.Get > import Control.Monad > > // ... > > instance Binary Block where > put _ = undefined > get = do > xs <- replicateM 16 getWord32le > return $ Block $ listArray (0, 15) xs > > make_block_bs :: B.ByteString -> Block > make_block_bs = decode > > Now we are getting faster: > [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip > E6542028D538BAEC180A96A5D1E6EC3A > 8.063u 0.174s 0:08.23 100.0% 0+0k 0+0io 0pf+0w Definitely a good idea. > 3) You are doing a lot of access to fields of a strict data type > (State). You can at least ask the compiler to help you a bit with > -funbox-strict-fields. > > Now we have: > [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip > E6542028D538BAEC180A96A5D1E6EC3A > 6.780u 0.133s 0:06.91 100.0% 0+0k 0+1io 0pf+0w > > We have got a good, nearly 2x, speed-up with very few optimizations, > and we run in a very small constant amount of memory. Great. > > Probably compiling with -fvia-C could help even more, but strangely: > > [roxas:~/Desktop/test2/MyMD5] kirby% ghc -fvia-C -funbox-strict-fields > -O2 --make md5sum.hs > [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip > Segmentation fault > > Is this supposed to happen? There're no "unsafe" functions or imports > used in the program. Maybe a bug in GHC? That could be a gcc bug, in fact. Can you provide the final source? Does changing the gcc optimisation level help? (i.e. -fvia-C -optc-O2 ) Andrew, I hope you look carefully at the suggestions here, and use them to improve the code you were working on. -- Don _______________________________________________ Haskell-Cafe mailing list [email protected] http://www.haskell.org/mailman/listinfo/haskell-cafe
