Calle Hedberg wrote:
> Hi,
> 
> Tim has a point with OpenOffice 2, but be aware that the beta version is
> buggy (I got tired of it bombing out on me and removed it until a more
> stable version is avaiable). In particular, I found it nearly impossible to
> open large files (I have lots of Excel pivot table files in the 50-300MB
> range and some large Word files with embedded data). Complex word files
> (graphics/tables/etc) would often come out "funny".

A 300MB spreadsheet...shudder! I must admit that I haven't used
OpenOffice 2 beta very much, which is perhaps why I haven't encoutered a
 crash, and any Word files I convert tend to be fairly simple.

> So if you use that kind of tool in batch, I would make sure I "twin" every
> XML version with the original Word file so that users easily can go back to
> the original if they find the converted version messed up. With thousands of
> files converted in batch mode, assume that some of them won't be looked at
> by a sober human for maybe 10 or 15 years.

Perhaps twin the XML with a PDF of the original Word file, since you
don't want those sober humans in 10 or 15 years time to have to mortgage
their house to buy an annual license for Microsoft Office Longhorn XXXP
2020 which they then have to install their computer onto (by 2020,
computer hardware is very cheap, but proprietary software is very
expensive - due to its tiny market share - so you install special
purpose hardware onto the software in oeder to run it, not vice-versa as
we do now...).

Tim C

>>-----Original Message-----
>>From: Tim Churches [mailto:[EMAIL PROTECTED] 
>>Sent: 16 March 2005 06:49 PM
>>To: openhealth-list@minoru-development.com
>>Subject: Re: M$oft Word to XML or HTML conversion
>>
>>Daniel L. Johnson wrote:
>>
>>>Dear All,
>>>
>>>Anybody here know of a tool to convert MicroSoft Word files 
>>
>>to XML or 
>>
>>>HTML?  We have a huge archive of Word files...
>>
>>What sort of XML? Ms-Word saves its documents as XML - but 
>>the DTD used is proprietary.
>>
>>As Ignacio said, MS Word can save as HTML, but the resulting 
>>HTML files are full of proprietary Microsoft extensions to 
>>HTML. MS-Word 2002 and later offer a choice to safe as 
>>"filtered HTML" which is a bit cleaner, but still horrible.
>>
>>The best way to convert MS-Word files to an open 
>>standards-based XML format is to use a beta version of the 
>>forthcoming OpenOffice 2.0 - see http://www.openoffice.org/  
>>The beta versions work fine, and will save to the OASIS 
>>OpenDocument XML standards (see 
>>http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office ).
>>Actualy, I think OpenOffice 1.1.4 also allows you to save to 
>>OpenDocument format, but the OpenOffice 2.0 beta will do a 
>>better job at importing complex MS-Word documents (especially 
>>if they have nested tables).
>>
>>It should be easy to write a macro to automate the 
>>conversion, or you can drive OpenOffice from a Python script 
>>via PyUNO if you are keen.
>>
>>Tim C
>>
>>
>>
> 
> 
> 
> 

Reply via email to