Hi Johan, > Here's how I would do it:
I implemented your method, with these minimal changes (i.e. just using a main driver in the same file.) > countUnigrams :: Handle -> IO (M.Map S.ByteString Int) > countUnigrams = foldLines (\ m s -> M.insertWith (+) s 1 m) M.empty > > main :: IO () > main = do (f:_) <- getArgs > openFile f ReadMode >>= countUnigrams >>= print . M.toList It seems to perform about 3x worse than the iteratee method in terms of time, and worse in terms of space :-( On Brandon's War & Peace example, hGetLine uses 1.565 seconds for the small file, whereas my iteratee method uses 1.085s for the small file, and around 2 minutes for the large file. For the large file, the code above starts consuming around 2.5GB of RAM, so it clearly has a space leak somewhere. Where, I don't know. If you want to try it out, here's a short command line to make a test corpus the way Brandon made one: +++ wget 'http://www.gutenberg.org/files/2600/2600.zip'; unzip 2600.zip; touch wnp100.txt; for i in {1..100}; do echo -n "$i "; cat 2600.txt >> wnp100.txt; done; echo "Done. +++ Note, that, as I detailed in my prior email to Brandon, even if you do end up with a (supposedly) non-leaking program for this example corpus, that doesn't mean it'll scale well to real world data. I also tried sprinkling strictness annotations throughout your above code, but I failed to produce good results :-( > We definitely need more accessible material on how to reliably write > fast Haskell code. There are those among us who can, but it shouldn't > be necessary to learn it in the way they did (i.e. by lots of > tinkering, learning from the elders, etc). I'd like to write a 60 (or > so) pages tutorial on the subject, but haven't found the time. I'd be an eager reader :-) Please do announce it on -cafe or the "usual places" should you ever come around to writing it! I, unfortunately, don't really have any contact to "the elders," apart from what I read on their respective blogs⦠> In addition to RWH, perhaps the slides from the talk on > high-performance Haskell I gave could be useful: > > > http://blog.johantibell.com/2010/09/slides-from-my-high-performance-haskell.html Thanks, I'll give it a look later tomorrow! Regards, Aleks PS: Sorry I didn't answer you in #haskell, I ended up having to go afk for a short while. Thanks for all your help!
signature.asc
Description: Digital signature
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe