Ketil Malde <[EMAIL PROTECTED]> writes: > HaXml on my list after TagSoup, which I'm about to get to work, I > think (got distracted a bit ATM).
As it is, I managed to parse my document using TagSoup. One major obstacle was the need to process a sizeable partition of the file. Using 'partitions' from TagSoup (which is implemented using the 'groupBy (const (not . p))' trick) didn't work, as it requires space proportional to the partition size. My solution (and please forgive me, it is getting late at night here) was to replace it with (slightly different semantics alert): breaks :: (a -> Bool) -> [a] -> [[a]] breaks p (x:xs) = let first = x : takeWhile (not.p) xs rest = dropWhile (not.p) xs in rest `par` first : if null rest then [] else breaks p rest I have no idea how reliable this is, and I suspect it isn't very, but on the plus side it does seems to work, at long as I compile with -smp. Parsing 300Mbytes of XML and outputting the information in 305K records takes approximately 5 minutes, and works with less than 1G of heap. This is fast and small enough for my purposes. Thanks for listening, and good night! -k -- If I haven't seen further, it is by standing in the footprints of giants _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe