Hello Hari, I definately recommend the "translate to XML" approach, because you then have much more control over the HTML that's produced (via an XSL) - if you want to do something meaningful with the HTML.
Our product, the xDoc Converter (www.cambridgedocs.com) splits up a word file into XML and an XSL which can be used for converting it to HTML. You can do it for one document or a thousand documents with the product - you can also embed it within an application and call it programatically. The product was just officially released, and a fully functional eval copy can be downloaded on our site, www.cambridgedocs.com. Thanks! Riz ------------------------------ Riz Virk, (617) 905-3518 [EMAIL PROTECTED], [EMAIL PROTECTED] http://www.cambridgedocs.com -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Charles Reitzel Sent: Saturday, January 04, 2003 4:37 PM To: Hari M Cc: [EMAIL PROTECTED] Subject: Re: [cms-list] Microsoft Word to HTML Hi Hari, What Tidy options are you using? Given the amount of markup that gets chopped out of Word output, some reformatting of the source is necessary. But the results are fairly neutral to rendering in a browser. If you use the "clean" or "drop-font-tags" options, almost all presentation data will be dropped. See http://tidy.sourceforge.net/docs/quickref.html#drop-font-tags http://tidy.sourceforge.net/docs/quickref.html#clean http://tidy.sourceforge.net/docs/quickref.html#output-xhtml You will probably want to add a stylesheet to set fonts, etc. This you will need to do on your own. Sed works well on Tidy output, but is shaky on arbitrary markup. Also, which version of Word and Tidy? hth, Charles Reitzel At 03:51 PM 1/3/2003 -0800, Hari M wrote: >What is the best way to get MS Word to HTML? > >I have a text box that users can use to enter information to upload to >their website. Normally users copy and paste from MS Word. I use a WSIWIG >rich text box editor that can except most of MS Word formats. > >I tried using Tidy HTMl as an option to remove the clutter that Word >inserts - but it messes up with the format. > >Is the best option to convert MS Word to XML and then to HTML? >Thanks. > >I posted a similar question earlier but it did not appear on the list - my >appologies if this appears twice. > >Thanks, > >Harry > > > > > > > > > > > > > >------------------------------------------ >MSN 8: advanced junk mail protection and >------------------------------------------ >2 months FREE* >------------------------------------------ > >------------------------------------------ > > >--- StripMime Report -- processed MIME parts --- >text/html (html body -- converted) >--- >-- >http://cms-list.org/ >a wish for peace in the new year. -- http://cms-list.org/ a wish for peace in the new year. -- http://cms-list.org/ more signal, less noise.