BTW James, Good call in suggesting tidyMarkup (instead of my original wellFormedHtml) As it turns out, a nasty sample file I found (with TagSoup) was not what one could consider HTML at all
http://home.ccil.org/~cowan/XML/tagsoup/extreme.html This comes out to well formed XML just fine. (I wasn't surprised of course, but the name then suited the results :-) (needless to say I added this to the test case) r.
