At 10:31 PM 10/1/2002 +0530, you wrote:
>found this link that converts html to xml.
>
>ask raj if it is absolutely necessary to convert to sgml/docbook or would xml
>  do ?

XML should do *much* better. You can convert to anything from that. SGML 
restricts us to the tools which are available . XML tools are available 
too, but more importantly it is *much* easier to make your own 
file-format-converter if you want.

>how abt xhtml ?

Nope. Wont do. A document in xhtml is just html following some strict XML 
rules. You still have to convert xhtml to the format that is required, that 
is about the same problem that you started with.

>tidy is under MIT License. is that considered free ?
>
>http://tidy.sourceforge.net
>
>http://www.w3.org/People/Raggett/tidy/
>

What do you need tidy for? Other than pretty-ing the HTML source?

Personally, for existing content, I see no option than to either manually 
convert or get tools to convert to docbook sgml/xml(xml preferably).

For new content, to lessen the work of the content developers, I propose 
that the authors use the TWiki itself to add content. The simple formatting 
rules in that would be a trivial task to convert using a perl script to 
docbook later on. That way we can get a collaborative editing canvas at our 
disposal, with the scope for generating other forms of documentation when 
needed.


>lots of tools are available for converting html to xml, but they are all
>  using MS technologies, like MSXMLDOM, which apparently makes it very easy to
>  do this conversion. can find out more abt this is need be.

Oh no. Not that! MSXMLDOM is notorious for a) creating a lot of cruft in 
the output to allow reverse-conversion in the future. b) (at least some 
time back), creating XML files with some incompatibility so that only the 
same parser can read it again.e.g NOT quoting the first attribute in any 
tag(I have seen this with my own eyes!).

- Sandip

Reply via email to