I have to ask for your help on general approaches to write Haskell
programs using space more economically.

I hope to use Haskell for my daily programming next semester 
so I'm trying to apply Haskell to some small practical applications. 
The first try is a simple compression program. However, I found that
I need 25MB RAM to compress a 31K file. (total alloc = 25MB in
the time profile generated by GHC. I had to assign it a 2MB
stack and 8MB heap to get it run).

Enabling the heap profiler, I found the functions consume the most
space to be 
      main  -- Read the file, generate the huffman tree and
               then encode. In peudo code :
                  getArgs   >>= \ [file] ->
                  readFile file >>= \ fdata ->
                  let tree = huff_tree fdata
                  in output (huff_encode tree fdata)
                       :
 count_freq -- given the file content, count the times each character
               appears
     encode -- given the file content and a table, generates the encoded
               output.

First of all, I noticed that there're 2 references to fdata.
It may have forced the program to keep the entire file in
memory. So I rewrote 'main', read the same file 2 twice.
It seemed to work! The total alloced memory reduced 1 MB.

Then I turned to 'encode'. Encode was originally written in
a tail-recursive style, that is

       encode [] result = result
       encode (fd:fdata) result = encode fdata (do_something_about result)

Previously I thought tail-recursive is good because it can be
transformed into loop. However in this style the final result 
can't be extracted until all computation has been done (am I right?).
Before that, entire data must be kept in memory. So I used a non-tail 
recursive version, 

       encode [] = ..
       encode (fd:fdata) = (something : (encode fdata))

in hope that this would save some space when the program is evaluated
lazily. It also worked! The total alloc reduced to around 7 MB
and the heap usage graph of 'encode' redued from a high mountain
to several small hills. But I still had to assign it megas of
stack and heap space.

Finally it came to 'count_freq'. However I had no good method
to fix it. 

   count_freq :: [Char] -> [(Char, Freq)] 
   count_freq str = [(chr i, freq_arry ! i) | i <- range (0,255), 
                                              freq_arry ! i /= 0] 
    where freq_arry = accumArray (+) 0 (0,255) [(ord ch) := 1 | ch <- str]

What should I do?

Even the whole file was loaded into memory, I don't understand
how could a 31K file be expanded into megas. How should I write
Haskell programs to make it more economical in memory if I want to
process large files?
            
--
plato. rene descartes.william somerset maugham. albert einstein.alan turing
pablo picasso.homer.nicolaus copernicus         PALADIN MU            georg 
cantor. lau tzu . kurt goedel .johnann     [EMAIL PROTECTED]      seb
astian bach. alonzo church.claude monet   DEP. OF CIS, NCTU. TAIWAN  donald
knuth.charles babbage.john backus.von neumann.william shakespeare.aristotle


Reply via email to