"Yitzchak Gale" <[EMAIL PROTECTED]> wrote: > Henning Thielemann wrote: > > HXT uses Parsec, which is strict. > > Is is strict to the extent that it cannot produce any > output at all until it has read the entire XML document? > That would make HXT (and Parsec, for that matter) > useless for a large percentage of tasks.
Yes, and yes. By contrast, the Utrecht parser combinator library gives "online" results, meaning that it delivers as much as it can without ambiguity. It is a bit like laziness, but it analyses the grammar to determine when it is safe to commit to a value, essentially once no error has been seen in a prefix of the input. And the polyparse library has several variations of properly lazy parsers, which only return results on demand (but there might be parse errors hidden inside the returned values, as exceptions). The user (grammar-writer) decides where the results should be lazy or strict. HaXml now uses the polyparse library, and you can choose whether you want well-formedness checking with the original strict parser, or lazy space-efficient on-demand parsing. Initial performance results show that parsing XML lazily is always better than 2x as fast, and 1/2x peak memory usage of strict parsing. In some usage patterns, it can reduce the cost of processing from linear in the size of the document, to a constant (the distance into the document to find a particular element). I have just made fresh releases of development versions of these libraries, for those who would like to experiment. http://www.cs.york.ac.uk/fp/polyparse http://www.cs.york.ac.uk/fp/HaXml-devel They are also available on hackage.haskell.org. Regards, Malcolm _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe