Thanks - you should have replied to the list, because I think I did your package a dis-service. I've just been looking at the Haskell XML Toolbox, and comparing the two, and now that I understand a little more it seems like either will be fine for me.
In fact I will copy this to the list, hope that's OK, because maybe someone will find this info via Google one day and find it useful. Cheers, Andrew Malcolm Wallace said: > "andrew cooke" <[EMAIL PROTECTED]> writes: > >> - HaXml looks like it might do what I want, but >> (1) seems tricky to install (needs "make", which isn't that cool for >> Windows); > > Until the general Haskell Library Infrastructure project is > sufficiently mature, I'm afraid 'make' is going to be pretty > de rigeur for any build-from-source library. > > Having said that, in the case of HaXml I reckon it would be pretty > straightforward to dispense with 'make' and issue a couple of 'ghc > --make' commands by hand. Especially since you seem only to want a > few of HaXml's facilities, not the complete set. > > Another alternative is simply to copy the small number of modules > you need into your local build tree, and ignore the standard package > mechanism altogether. > >> (2) has a load of fancy-schmancy combinator stuff, when all I want is a >> stream of tokens (something like the Java SAX interface); > > If you really want only a stream of tokens, have a look at > Text.XML.HaXml.Lex. For an error-correcting parse into a generic > tree-like XML data structure, use Text.XML.HaXml.Html.Parse. You don't > need the Combinators, Haskell2Xml, Xml2Haskell stuff at all. > >> (3) doesn't seem that solid on the basics >> (doesn't seem to handle namespaces (maybe they appear as part >> of the attribute name?) > > Namespaces are transparent, in the sense that the namespace is part > of the element or attribute name, but there is no further automatic > processing of it. So basically HaXml doesn't do anything fancy with > namespaces, but it doesn't crash, or discard them either. > >> (and I haven't yet worked out what it does about >> other "esoteric" things like character entities, XML declarations, >> CDATA, >> comments, etc)). > > All of these are stored in the 'generic' XML data structure > representation, so you can use them or discard them as you wish. > > data Element = Elem Name [Attribute] [Content] > type Attribute = (Name, AttValue) > data Content = CElem Element > | CString Bool CharData -- bool is whether whitespace > is significant > | CRef Reference -- character and entity references > | CMisc Misc -- comments, processing > instructions, > etc. > > >> (No offense implied - it's a cool piece of work, just >> doesn't seem to be what I'm looking for; > > None taken. I'm sure it looks complicated from the outside, but > really it is just a collection of individual pieces that can be > mixed and matched to suit the needs of any particular application. > >> I'd write it myself, but (X)HTML is deceptively complex, ... >> HTML isn't XML, > > HaXml's special error-correcting HTML parser deals with most of this > stuff, for instance self-closing tags (IMG), implicitly closed tags (P), > improperly nested tags, and so on. > >> typical malformed pages (unescaped "<" in text; unescaped data in >> URLs inside links (eg "&"), etc) > > These two examples of error situations might be beyond the current > capability of the error-correcting parser, but I haven't checked in > a long while. > > So in summary, I think HaXml will get you a long way towards your > goal, but you will probably want to be selective about what you use, > and there may be extra things you need to code for yourself on top. > > Regards, > Malcolm > > -- personal web site: http://www.acooke.org/andrew personal mail list: http://www.acooke.org/andrew/compute.html _______________________________________________ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell