Another option is Metaverse's XForm Web service, a .NET Web Service which converts MS Word documents to XML format, and includes the XSL style-sheets to convert the XML to nicely formatted HTML. The XForm Web Service is server safe, designed to be used in situations where many users will be calling it at the same time, and in high volume. That's a big distinction, as there are a few other tools which will do the job, but they are by large not fit for high volume server-side applications. Word can become unstable and freeze up during automation requests with some documents, and if you are building a custom content management system, you never know what kinds of documents your users are going to throw at it - you don't want the service to become unavailable for other users if one user gives it an invalid document. Just something to look out for.
You can create a free trial account and see a demo of the XForm web service at: http://www.metaverse.cc or http://xform.metaverse.cc/xformdemo Regards, Doug -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Rizwan Virk Sent: Monday, January 13, 2003 8:58 AM To: Hari M Cc: [EMAIL PROTECTED] Subject: RE: [cms-list] Microsoft Word to HTML Hello Hari, I definately recommend the "translate to XML" approach, because you then have much more control over the HTML that's produced (via an XSL) - if you want to do something meaningful with the HTML. Our product, the xDoc Converter (www.cambridgedocs.com) splits up a word file into XML and an XSL which can be used for converting it to HTML. You can do it for one document or a thousand documents with the product - you can also embed it within an application and call it programatically. The product was just officially released, and a fully functional eval copy can be downloaded on our site, www.cambridgedocs.com. Thanks! Riz ------------------------------ Riz Virk, (617) 905-3518 [EMAIL PROTECTED], [EMAIL PROTECTED] http://www.cambridgedocs.com -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Charles Reitzel Sent: Saturday, January 04, 2003 4:37 PM To: Hari M Cc: [EMAIL PROTECTED] Subject: Re: [cms-list] Microsoft Word to HTML Hi Hari, What Tidy options are you using? Given the amount of markup that gets chopped out of Word output, some reformatting of the source is necessary. But the results are fairly neutral to rendering in a browser. If you use the "clean" or "drop-font-tags" options, almost all presentation data will be dropped. See http://tidy.sourceforge.net/docs/quickref.html#drop-font-tags http://tidy.sourceforge.net/docs/quickref.html#clean http://tidy.sourceforge.net/docs/quickref.html#output-xhtml You will probably want to add a stylesheet to set fonts, etc. This you will need to do on your own. Sed works well on Tidy output, but is shaky on arbitrary markup. Also, which version of Word and Tidy? hth, Charles Reitzel At 03:51 PM 1/3/2003 -0800, Hari M wrote: >What is the best way to get MS Word to HTML? > >I have a text box that users can use to enter information to upload to >their website. Normally users copy and paste from MS Word. I use a WSIWIG >rich text box editor that can except most of MS Word formats. > >I tried using Tidy HTMl as an option to remove the clutter that Word >inserts - but it messes up with the format. > >Is the best option to convert MS Word to XML and then to HTML? >Thanks. > >I posted a similar question earlier but it did not appear on the list - my >appologies if this appears twice. > >Thanks, > >Harry > > > > > > > > > > > > > >------------------------------------------ >MSN 8: advanced junk mail protection and >------------------------------------------ >2 months FREE* >------------------------------------------ > >------------------------------------------ > > >--- StripMime Report -- processed MIME parts --- >text/html (html body -- converted) >--- >-- >http://cms-list.org/ >a wish for peace in the new year. -- http://cms-list.org/ a wish for peace in the new year. -- http://cms-list.org/ more signal, less noise. -- http://cms-list.org/ more signal, less noise.