At 10:31 PM 10/1/2002 +0530, you wrote: >found this link that converts html to xml. > >ask raj if it is absolutely necessary to convert to sgml/docbook or would xml > do ?
XML should do *much* better. You can convert to anything from that. SGML restricts us to the tools which are available . XML tools are available too, but more importantly it is *much* easier to make your own file-format-converter if you want. >how abt xhtml ? Nope. Wont do. A document in xhtml is just html following some strict XML rules. You still have to convert xhtml to the format that is required, that is about the same problem that you started with. >tidy is under MIT License. is that considered free ? > >http://tidy.sourceforge.net > >http://www.w3.org/People/Raggett/tidy/ > What do you need tidy for? Other than pretty-ing the HTML source? Personally, for existing content, I see no option than to either manually convert or get tools to convert to docbook sgml/xml(xml preferably). For new content, to lessen the work of the content developers, I propose that the authors use the TWiki itself to add content. The simple formatting rules in that would be a trivial task to convert using a perl script to docbook later on. That way we can get a collaborative editing canvas at our disposal, with the scope for generating other forms of documentation when needed. >lots of tools are available for converting html to xml, but they are all > using MS technologies, like MSXMLDOM, which apparently makes it very easy to > do this conversion. can find out more abt this is need be. Oh no. Not that! MSXMLDOM is notorious for a) creating a lot of cruft in the output to allow reverse-conversion in the future. b) (at least some time back), creating XML files with some incompatibility so that only the same parser can read it again.e.g NOT quoting the first attribute in any tag(I have seen this with my own eyes!). - Sandip