Another option is Metaverse's XForm Web service, a .NET Web Service
which converts MS Word documents to XML format, and includes the XSL
style-sheets to convert the XML to nicely formatted HTML.  The XForm Web
Service is server safe, designed to be used in situations where many
users will be calling it at the same time, and in high volume.  That's a
big distinction, as there are a few other tools which will do the job,
but they are by large not fit for high volume server-side applications.
Word can become unstable and freeze up during automation requests with
some documents, and if you are building a custom content management
system, you never know what kinds of documents your users are going to
throw at it - you don't want the service to become unavailable for other
users if one user gives it an invalid document.  Just something to look
out for.

You can create a free trial account and see a demo of the XForm web
service at:

http://www.metaverse.cc

or

http://xform.metaverse.cc/xformdemo

Regards,
Doug

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
On Behalf Of Rizwan Virk
Sent: Monday, January 13, 2003 8:58 AM
To: Hari M
Cc: [EMAIL PROTECTED]
Subject: RE: [cms-list] Microsoft Word to HTML

Hello Hari,

I definately recommend the "translate to XML" approach, because you then
have much more control over the HTML that's produced (via an XSL) - if
you
want to do something meaningful with the HTML.

Our product, the xDoc Converter (www.cambridgedocs.com) splits up a word
file into XML and an XSL which can be used for converting it to HTML.
You
can do it for one document or a thousand documents with the product -
you
can also embed it within an application and call it programatically.
The
product was just officially released, and a fully functional eval copy
can
be downloaded on our site, www.cambridgedocs.com.

Thanks!
Riz


------------------------------
Riz Virk, (617) 905-3518
[EMAIL PROTECTED], [EMAIL PROTECTED]
http://www.cambridgedocs.com


-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
Behalf Of Charles Reitzel
Sent: Saturday, January 04, 2003 4:37 PM
To: Hari M
Cc: [EMAIL PROTECTED]
Subject: Re: [cms-list] Microsoft Word to HTML


Hi Hari,

What Tidy options are you using?  Given the amount of markup that gets
chopped out of Word output, some reformatting of the source is
necessary.  But the results are fairly neutral to rendering in a
browser.

If you use the "clean" or "drop-font-tags" options, almost all
presentation
data will be dropped.  See

http://tidy.sourceforge.net/docs/quickref.html#drop-font-tags
http://tidy.sourceforge.net/docs/quickref.html#clean
http://tidy.sourceforge.net/docs/quickref.html#output-xhtml

You will probably want to add a stylesheet to set fonts, etc.  This you
will need to do on your own.  Sed works well on Tidy output, but is
shaky
on arbitrary markup.

Also, which version of Word and Tidy?

hth,
Charles Reitzel


At 03:51 PM 1/3/2003 -0800, Hari M wrote:

>What is the best way to get MS Word to HTML?
>
>I have a text box that users can use to enter information to upload to
>their website. Normally users copy and paste from MS Word. I use a
WSIWIG
>rich text box editor that can except most of MS Word formats.
>
>I tried using Tidy HTMl as an option to remove the clutter that Word
>inserts - but it messes up with the format.
>
>Is the best option to convert MS Word to XML and then to HTML?
>Thanks.
>
>I posted a similar question earlier but it did not appear on the list -
my
>appologies if this appears twice.
>
>Thanks,
>
>Harry
>
>
>
>
>
>
>
>
> 
>
>
>
>
>------------------------------------------
>MSN 8: advanced junk mail protection and
>------------------------------------------
>2 months FREE*
>------------------------------------------
>
>------------------------------------------
>
>
>--- StripMime Report -- processed MIME parts ---
>text/html (html body -- converted)
>---
>--
>http://cms-list.org/
>a wish for peace in the new year.

--
http://cms-list.org/
a wish for peace in the new year.

--
http://cms-list.org/
more signal, less noise.


--
http://cms-list.org/
more signal, less noise.

Reply via email to