It appears that the Haskell XML Toolbox may be what I want - http://www.fh-wedel.de/~si/HXmlToolbox/ - but any other suggestions would be appreciated. Apologoies for relying on Haskell.org rather than Googling (I'll mail the web page maintainers).
Cheers, Andrew andrew cooke said: > > What are the options for parsing/lexing (X)HTML? As far as I can see... > > - the HTML library in GHC (or from Andy Gill) is for creating documents, > not parsing them > > - HaXml looks like it might do what I want, but (1) seems tricky to > install (needs "make", which isn't that cool for Windows); (2) has a load > of fancy-schmancy combinator stuff, when all I want is a stream of tokens > (something like the Java SAX interface); (3) doesn't seem that solid on > the basics (doesn't seem to handle namespaces (maybe they appear as part > of the attribute name?) (and I haven't yet worked out what it does about > other "esoteric" things like character entities, XML declarations, CDATA, > comments, etc)). (No offense implied - it's a cool piece of work, just > doesn't seem to be what I'm looking for; this is all from reading the docs > and api rather than looking at code, so I may be mistaken). > > - nothing else on the haskell.org page appears to do parsing. > > I'd write it myself, but (X)HTML is deceptively complex, in my experience. > You start of thinking it's going to be trivial (S-expressions), then you > realise that there HTML isn't XML, then there are character entities, > weird CDATA things, namespaces, that what you have isn't robust enough to > parse typical malformed pages (unescaped "<" in text; unescaped data in > URLs inside links (eg "&"), etc) that are accepted by browsers, etc. > > Maybe that's why there doesn't seem to be anything?! > > (I'm writing a simple tool that generates web pages from templates; the > tool data appears in attributes with a namespace (this is the standard > trick for mixing code generation with HTML in a way that web authoring > tools can parse). Hence the mix of requirements for HTML and XML.) > > Cheers, > Andrew > > -- > personal web site: http://www.acooke.org/andrew > personal mail list: http://www.acooke.org/andrew/compute.html > _______________________________________________ > Haskell mailing list > [EMAIL PROTECTED] > http://www.haskell.org/mailman/listinfo/haskell > > -- personal web site: http://www.acooke.org/andrew personal mail list: http://www.acooke.org/andrew/compute.html _______________________________________________ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell