Hello Hari,

I definately recommend the "translate to XML" approach, because you then
have much more control over the HTML that's produced (via an XSL) - if you
want to do something meaningful with the HTML.

Our product, the xDoc Converter (www.cambridgedocs.com) splits up a word
file into XML and an XSL which can be used for converting it to HTML.  You
can do it for one document or a thousand documents with the product - you
can also embed it within an application and call it programatically.  The
product was just officially released, and a fully functional eval copy can
be downloaded on our site, www.cambridgedocs.com.

Thanks!
Riz


------------------------------
Riz Virk, (617) 905-3518
[EMAIL PROTECTED], [EMAIL PROTECTED]
http://www.cambridgedocs.com


-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
Behalf Of Charles Reitzel
Sent: Saturday, January 04, 2003 4:37 PM
To: Hari M
Cc: [EMAIL PROTECTED]
Subject: Re: [cms-list] Microsoft Word to HTML


Hi Hari,

What Tidy options are you using?  Given the amount of markup that gets
chopped out of Word output, some reformatting of the source is
necessary.  But the results are fairly neutral to rendering in a browser.

If you use the "clean" or "drop-font-tags" options, almost all presentation
data will be dropped.  See

http://tidy.sourceforge.net/docs/quickref.html#drop-font-tags
http://tidy.sourceforge.net/docs/quickref.html#clean
http://tidy.sourceforge.net/docs/quickref.html#output-xhtml

You will probably want to add a stylesheet to set fonts, etc.  This you
will need to do on your own.  Sed works well on Tidy output, but is shaky
on arbitrary markup.

Also, which version of Word and Tidy?

hth,
Charles Reitzel


At 03:51 PM 1/3/2003 -0800, Hari M wrote:

>What is the best way to get MS Word to HTML?
>
>I have a text box that users can use to enter information to upload to
>their website. Normally users copy and paste from MS Word. I use a WSIWIG
>rich text box editor that can except most of MS Word formats.
>
>I tried using Tidy HTMl as an option to remove the clutter that Word
>inserts - but it messes up with the format.
>
>Is the best option to convert MS Word to XML and then to HTML?
>Thanks.
>
>I posted a similar question earlier but it did not appear on the list - my
>appologies if this appears twice.
>
>Thanks,
>
>Harry
>
>
>
>
>
>
>
>
> 
>
>
>
>
>------------------------------------------
>MSN 8: advanced junk mail protection and
>------------------------------------------
>2 months FREE*
>------------------------------------------
>
>------------------------------------------
>
>
>--- StripMime Report -- processed MIME parts ---
>text/html (html body -- converted)
>---
>--
>http://cms-list.org/
>a wish for peace in the new year.

--
http://cms-list.org/
a wish for peace in the new year.

--
http://cms-list.org/
more signal, less noise.

Reply via email to