I'm trying to convert an XML document, incrementally, into a sequence of XML events. A simple example XML document:
<doc xmlns="org:myproject:mainns" xmlns:x="org:myproject:otherns"> <title>Doc title</title> <x:ref>abc1234</x:ref> <html xmlns="http://www.w3.org/1999/xhtml"><body>Hello world!</body></html> </doc> The document can be very large, and arrives in chunks over a socket, so I need to be able to "feed" the text data into a parser and receive a list of XML events per chunk. Chunks can be separated in time by intervals of several minutes to an hour, so pausing processing for the arrival of the entire document is not an option. The type signatures would be something like: type Namespace = String type LocalName = String data Attribute = Attribute Namespace LocalName String data XMLEvent = EventElementBegin Namespace LocalName [Attribute] | EventElementEnd Namespace LocalName | EventContent String | EventError String parse :: Parser -> String -> (Parser, [XMLEvent]) I've looked at HaXml, HXT, and hexpat, and unless I'm missing something, none of them can achieve this: + HaXml and hexpat seem to disregard namespaces entirely -- that is, the root element is parsed to "doc" instead of ("org:myproject:mainns", "doc"), and the second child is "x:ref" instead of ("org:myproject:otherns", "ref"). Obviously, this makes parsing mixed-namespace documents effectively impossible. I found an email from 2004[1] that mentions a "filter" for namespace support in HaXml, but no further information and no working code. + HXT looks promising, because I see explicit mention in the documentation of recording and propagating namespaces. However, I can't figure out if there's an incremental mode. A page on the wiki[2] suggests that SAX is supported in the "html tag soup" parser, but I want incremental parsing of *valid* documents. If incremental parsing is supported by the standard "arrow" interface, I don't see any obvious way to pull events out into a list -- I'm a Haskell newbie, and still haven't quite figured out monads yet, let alone Arrows. Are there any libraries that support namespace-aware incremental parsing? [1] http://www.haskell.org/pipermail/haskell-cafe/2004-June/006252.html [2] http://www.haskell.org/haskellwiki/HXT/Conversion_of_Haskell_data_from/to_XML _______________________________________________ Haskell-Cafe mailing list [email protected] http://www.haskell.org/mailman/listinfo/haskell-cafe
