Hi,

I am learning iteratees, and as a starter project I wanted to use expat-
enumerator to parse a 2 gigabyte XML file.

I expected to be able to do what SAX does in Java, i.e. to avoid loading the
whole 2 gigabytes into memory.  For warm-up, I wrote an iteratee to count lines
in the file, and it does load the whole file into memory!  After profiling, I
see that the problem was Data.Enumerator.Text.utf8, it allocates up to 60
megabytes when run on a 40 megabyte test file.

Any suggestions how to fix Text.utf8, or what people do for parsing UTF-8
encoded text files with iteratees?

Thanks!

Here is my code and profiling results:

http://i.imgur.com/XEI1v.png
http://hpaste.org/46037/counting_lines_with_iteratees
http://hpaste.org/46038/counting_lines_with_iteratees

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to