dons: > martine: > > On 5/14/06, Eugene Crosser <[EMAIL PROTECTED]> wrote: > > >main = printMax . (foldr processLine empty) . lines =<< getContents > > >[snip] > > >The thing kinda works on small data sets, but if you feed it with > > >250,000 lines (1000 distinct), the process size grows to 200 Mb, and on > > >500,000 lines I get "*** Exception: stack overflow" (using runhaskell > > >from ghc 6.2.4). > > > > To elaborate on Udo's point: > > If you look at the definition of foldr you'll see where the stack > > overflow is coming from: foldr recurses all the way down to the end > > of the list, so your stack gets 250k (or attempts 500k) entries deep > > so it can process the last line in the file first, then unwinds. > > Also, don't use runhaskell! Compile the code with -O :)
Not sure what processLine does, but just trying out Data.ByteString on this as a test: > import qualified Data.ByteString.Char8 as B > import Data.List > > main = print . foldl' processLine 0 . B.lines =<< B.getContents > where processLine acc l = if B.length l > 10 then acc+1 else acc Just count the long lines. Probably you do something fancier. Anyway, 32M runs through this in: $ time ./a.out < /home/dons/fps/tests/32M 470400 ./a.out < /home/dons/fps/tests/32M 0.31s user 0.28s system 28% cpu 2.082 total with 32M heap (these are strict byte arrays). Using Data.ByteString.Lazy: > import qualified Data.ByteString.Lazy as B > import Data.List > > main = print . foldl' processLine 0 . B.split 10 =<< B.getContents > where processLine acc l = if B.length l > 10 then acc+1 else acc $ time ./a.out < /home/dons/fps/tests/32M 470400 ./a.out < /home/dons/fps/tests/32M 0.32s user 0.11s system 26% cpu 1.592 total With only 3M heap used. -- Don _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe