Friedrich wrote:
I've written just a few programs in Haskell one in a comparison for a
task I had "nearly daily".
The first thing I notice is that this is clearly a direct translation from something like Perl. Thats understandable, but I'd suggest rewriting it with something like this (untested, uncompiled code)

-- Concatenate all the files into one big string. File reading is lazy, so this won't take all the memory.
getAllFiles :: [String] -> IO String
getAllFiles paths = do
  contents <- mapM getFile paths
  return $ concat contents

Then use "lines" to split the result into individual lines and process them using "filter", "map" and "foldr". Because file reading is lazy, each line is only read when it is to be processed, and then gets reaped by the garbage collector. So it all runs in constant memory.

(By the way, putting in the top level type declarations helps a lot when you make a mistake.)

One thing you are doing right is keeping a (sum, count) pair. A gotcha with Haskell is to compute an average of a list of numbers like this:

  mean :: [Double] -> Double
  mean xs = sum xs / fromIntegral (length xs)

The problem with this is that it has to traverse the list twice, which means that the whole list has to be held in memory. So instead you have to write something like:

mean xs = let (total, count) = foldr (\x (t, c) -> (t + x, c+1)) (0.0, 0) xs in total / fromIntegral count

This is a pain, but it does only traverse the list once.

See how you get on.

Paul.


The code analyzes Apache logs and picks some certain stuff from it and
after that calculates a bit around with it.

Here's the code
module Main where
import System
import System.IO
import System.Directory
import System.IO.Error
import Text.Regex
import Control.Monad

regexp = mkRegex ("([0-9]+) Windows ex")

main = do
       files <- show_dir "[0-9].*"
       (sum,count) <- run_on_all_files (0,0) files
       let dd = (fromIntegral (sum::Integer))/ (fromIntegral (count::Int))
           in
putStr("Download = " ++ show sum ++ " in " ++ show count ++ " days are " ++ show dd ++ " downloads/day\n")



run_on_all_files (a,b) [] = return (a,b)
run_on_all_files (a,b) (x:xs) = do (s,c) <- run_on(a,b) x
                                   run_on_all_files (s,c) xs


run_on (a,b) file_name = do
    handle <- openFile file_name ReadMode
    (sum,count) <- for_each_line (a,b) handle
    hClose handle
    return ((sum,count))
for_each_line (sum,count) handle = do
                       l <- try (hGetLine handle)
                       case l of
Left err | isEOFError err -> return(sum,count)
                                  | otherwise -> ioError err
Right line -> do let (nsum, ncount) = check_line line sum count for_each_line (nsum,ncount) handle check_line line sum count = let match = matchRegex regexp line
        in case match of
               Just strs -> (sum + read (head strs) :: Integer, count + 1)
               Nothing -> (sum, count)
show_dir regmatch = do files <- getDirectoryContents "."
                    let reg = mkRegex regmatch in
                              return(filter (\file_name -> let fm = matchRegex 
reg file_name
                                      in case fm of
                                      Just strs -> True
                                      Nothing -> False) files)


The point is this code works if there are just say a few files
files to check. But  it trashes my machine with around 1751 files.

It sucks memory as wild and so it does not run as I  think it should.

I think I've overseen something which is bad written. Would you mind
to  tell me where I did "extraordinarily" bad.

With best regards
Friedrich



_______________________________________________
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell


_______________________________________________
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell

Reply via email to