Friedrich wrote:
I've written just a few programs in Haskell one in a comparison for a
task I had "nearly daily".
The first thing I notice is that this is clearly a direct translation
from something like Perl. Thats understandable, but I'd suggest
rewriting it with something like this (untested, uncompiled code)
-- Concatenate all the files into one big string. File reading is lazy,
so this won't take all the memory.
getAllFiles :: [String] -> IO String
getAllFiles paths = do
contents <- mapM getFile paths
return $ concat contents
Then use "lines" to split the result into individual lines and process
them using "filter", "map" and "foldr". Because file reading is lazy,
each line is only read when it is to be processed, and then gets reaped
by the garbage collector. So it all runs in constant memory.
(By the way, putting in the top level type declarations helps a lot when
you make a mistake.)
One thing you are doing right is keeping a (sum, count) pair. A gotcha
with Haskell is to compute an average of a list of numbers like this:
mean :: [Double] -> Double
mean xs = sum xs / fromIntegral (length xs)
The problem with this is that it has to traverse the list twice, which
means that the whole list has to be held in memory. So instead you have
to write something like:
mean xs = let (total, count) = foldr (\x (t, c) -> (t + x, c+1))
(0.0, 0) xs in total / fromIntegral count
This is a pain, but it does only traverse the list once.
See how you get on.
Paul.
The code analyzes Apache logs and picks some certain stuff from it and
after that calculates a bit around with it.
Here's the code
module Main where
import System
import System.IO
import System.Directory
import System.IO.Error
import Text.Regex
import Control.Monad
regexp = mkRegex ("([0-9]+) Windows ex")
main = do
files <- show_dir "[0-9].*"
(sum,count) <- run_on_all_files (0,0) files
let dd = (fromIntegral (sum::Integer))/ (fromIntegral (count::Int))
in
putStr("Download = " ++ show sum ++ " in " ++ show count ++ " days are " ++ show dd ++ " downloads/day\n")
run_on_all_files (a,b) [] = return (a,b)
run_on_all_files (a,b) (x:xs) = do (s,c) <- run_on(a,b) x
run_on_all_files (s,c) xs
run_on (a,b) file_name = do
handle <- openFile file_name ReadMode
(sum,count) <- for_each_line (a,b) handle
hClose handle
return ((sum,count))
for_each_line (sum,count) handle = do
l <- try (hGetLine handle)
case l of
Left err
| isEOFError err -> return(sum,count)
| otherwise -> ioError err
Right line -> do
let (nsum, ncount) = check_line line sum count
for_each_line (nsum,ncount) handle
check_line line sum count =
let match = matchRegex regexp line
in case match of
Just strs -> (sum + read (head strs) :: Integer, count + 1)
Nothing -> (sum, count)
show_dir regmatch = do
files <- getDirectoryContents "."
let reg = mkRegex regmatch in
return(filter (\file_name -> let fm = matchRegex
reg file_name
in case fm of
Just strs -> True
Nothing -> False) files)
The point is this code works if there are just say a few files
files to check. But it trashes my machine with around 1751 files.
It sucks memory as wild and so it does not run as I think it should.
I think I've overseen something which is bad written. Would you mind
to tell me where I did "extraordinarily" bad.
With best regards
Friedrich
_______________________________________________
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell
_______________________________________________
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell