Re: [Haskell] Probably a trivial thing for people knowing Haskell

Paul Johnson Sat, 18 Oct 2008 10:48:33 -0700

Friedrich wrote:

I've written just a few programs in Haskell one in a comparison for a
task I had "nearly daily".

The first thing I notice is that this is clearly a direct translationfrom something like Perl. Thats understandable, but I'd suggestrewriting it with something like this (untested, uncompiled code)

-- Concatenate all the files into one big string. File reading is lazy,so this won't take all the memory.

getAllFiles :: [String] -> IO String
getAllFiles paths = do
  contents <- mapM getFile paths
  return $ concat contents

Then use "lines" to split the result into individual lines and processthem using "filter", "map" and "foldr". Because file reading is lazy,each line is only read when it is to be processed, and then gets reapedby the garbage collector. So it all runs in constant memory.

(By the way, putting in the top level type declarations helps a lot whenyou make a mistake.)

One thing you are doing right is keeping a (sum, count) pair. A gotchawith Haskell is to compute an average of a list of numbers like this:


  mean :: [Double] -> Double
  mean xs = sum xs / fromIntegral (length xs)

The problem with this is that it has to traverse the list twice, whichmeans that the whole list has to be held in memory. So instead you haveto write something like:

mean xs = let (total, count) = foldr (\x (t, c) -> (t + x, c+1))(0.0, 0) xs in total / fromIntegral count


This is a pain, but it does only traverse the list once.

See how you get on.

Paul.

The code analyzes Apache logs and picks some certain stuff from it and
after that calculates a bit around with it.

Here's the code
module Main where
import System
import System.IO
import System.Directory
import System.IO.Error
import Text.Regex
import Control.Monad

regexp = mkRegex ("([0-9]+) Windows ex")

main = do
       files <- show_dir "[0-9].*"
       (sum,count) <- run_on_all_files (0,0) files
       let dd = (fromIntegral (sum::Integer))/ (fromIntegral (count::Int))
           in

putStr("Download = " ++ show sum ++ " in " ++ show count ++ " days are " ++ show dd ++ " downloads/day\n")




run_on_all_files (a,b) [] = return (a,b)
run_on_all_files (a,b) (x:xs) = do (s,c) <- run_on(a,b) x
                                   run_on_all_files (s,c) xs


run_on (a,b) file_name = do
    handle <- openFile file_name ReadMode
    (sum,count) <- for_each_line (a,b) handle
    hClose handle
    return ((sum,count))

for_each_line (sum,count) handle = do

                       l <- try (hGetLine handle)
                       case l of

Left err| isEOFError err -> return(sum,count)

                                  | otherwise -> ioError err

Right line -> dolet (nsum, ncount) = check_line line sum countfor_each_line (nsum,ncount) handlecheck_line line sum count =let match = matchRegex regexp line

        in case match of
               Just strs -> (sum + read (head strs) :: Integer, count + 1)
               Nothing -> (sum, count)

show_dir regmatch = dofiles <- getDirectoryContents "."

                    let reg = mkRegex regmatch in
                              return(filter (\file_name -> let fm = matchRegex 
reg file_name
                                      in case fm of
                                      Just strs -> True
                                      Nothing -> False) files)


The point is this code works if there are just say a few files
files to check. But  it trashes my machine with around 1751 files.

It sucks memory as wild and so it does not run as I  think it should.

I think I've overseen something which is bad written. Would you mind
to  tell me where I did "extraordinarily" bad.

With best regards
Friedrich



_______________________________________________
Haskell mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell


_______________________________________________
Haskell mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell

Re: [Haskell] Probably a trivial thing for people knowing Haskell

Reply via email to