What are the options for parsing/lexing (X)HTML? As far as I can see... - the HTML library in GHC (or from Andy Gill) is for creating documents, not parsing them
- HaXml looks like it might do what I want, but (1) seems tricky to install (needs "make", which isn't that cool for Windows); (2) has a load of fancy-schmancy combinator stuff, when all I want is a stream of tokens (something like the Java SAX interface); (3) doesn't seem that solid on the basics (doesn't seem to handle namespaces (maybe they appear as part of the attribute name?) (and I haven't yet worked out what it does about other "esoteric" things like character entities, XML declarations, CDATA, comments, etc)). (No offense implied - it's a cool piece of work, just doesn't seem to be what I'm looking for; this is all from reading the docs and api rather than looking at code, so I may be mistaken). - nothing else on the haskell.org page appears to do parsing. I'd write it myself, but (X)HTML is deceptively complex, in my experience. You start of thinking it's going to be trivial (S-expressions), then you realise that there HTML isn't XML, then there are character entities, weird CDATA things, namespaces, that what you have isn't robust enough to parse typical malformed pages (unescaped "<" in text; unescaped data in URLs inside links (eg "&"), etc) that are accepted by browsers, etc. Maybe that's why there doesn't seem to be anything?! (I'm writing a simple tool that generates web pages from templates; the tool data appears in attributes with a namespace (this is the standard trick for mixing code generation with HTML in a way that web authoring tools can parse). Hence the mix of requirements for HTML and XML.) Cheers, Andrew -- personal web site: http://www.acooke.org/andrew personal mail list: http://www.acooke.org/andrew/compute.html _______________________________________________ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell