Hello Haskell Cafe, I really hope this is the right list for this sort of question. I've bugged the folks in #haskell, they say go here, so I'm turning to you.
I want to use Hexpat to read in some humongous XML files (linguistic corpora,) since it's the only Haskell XML library (I could find) that takes ByteStrings as input. I stumbled on a problem when using one of the examples from the docs of Text.XML.Expat.Tree. The "cookbook recipe" there suggests *first* processing the data, and only then looking into the parser error to see if there has been an error. I understand this should prevent the parse tree from being fully evaluated before use. Unfortunately, that is not what happens on my system (ghc 6.12.1, if that's of importance.) This is the code from the docs, that I modified to read files: > import Text.XML.Expat.Tree > import System.Environment (getArgs) > import Control.Monad (liftM) > import qualified Data.ByteString.Lazy as C >· > -- This is the recommended way to handle errors in lazy parses > main = do > f <- liftM head getArgs >>= C.readFile > let (tree, mError) = parse defaultParseOptions f > print (tree :: UNode String) >· > -- Note: We check the error _after_ we have finished our processing > -- on the tree. > case mError of > Just err -> putStrLn $ "It failed : "++show err > Nothing -> putStrLn "Success!" Given a 42M test file, an invocation like this: % ghc --make -O2 Hexpat.hs % ./Hexpat input.xml > dump.xml will gobble up some 2Gigs of RAM (at least. I usually kill it before it starts thrashing the swap space, since that almost crashes my entire machine.) If I remove the last 3 lines: > import Text.XML.Expat.Tree > import System.Environment (getArgs) > import Control.Monad (liftM) > import qualified Data.ByteString.Lazy as C > > main = do > f <- liftM head getArgs >>= C.readFile > let (tree, mError) = parse defaultParseOptions f > print (tree :: UNode String) the same invocation and input file barely uses a megabyte or two of RAM and finishes really quickly. Why is that? Is this a mistake in the Hexpat docs, or am I doing something wrong? Lazy IO has always been a little bit of a mystery to me, and just when I thought I had it... Thanks for any help on the matter! Aleks _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe