Paul Noone wrote:
DocVert
It's as good a converter as OpenOffice.org is. What Docvert adds is a nice webservice interface, and a conversion process that's open and extensible through XSLT and XML pipelines, which - to answer your question - can support arbitrary word styles.

The XML pipelines look like this,

<pipeline>
   <stage process="TransformToDocBook"/>
   <stage process="Transform" withFile="webstyle.xsl"/>
   <stage process="Serialize" toFile="index.html"/>
</pipeline>

This pipeline starts with an OpenDocument file, and goes through these stages resulting in some HTML.

For more complex needs, such as serializing to multiple files (eg, breaking up html pages on heading1) and making a table of contents, the pipeline might look like this,

<pipeline>
   <stage process="TransformToDocBook"/>
       <stage process="Loop" numberOfTimes="xpathCount://sect1">
       <stage process="SplitPages"/>
       <stage process="Transform" withFile="webstyle.xsl"/>
       <stage process="Serialize" toFile="section{LoopIndex}.html"/>
   </stage>
   <stage process="GetPreface" forSectionLevel="0" splitPagesDepth="1"/>
   <stage process="Transform" withFile="webstyle.xsl"/>
   <stage process="Serialize" toFile="index.html"/>
</pipeline>

As you see there are some inbuilt abstractions to do with DocBook and extracting prefaces, so you can just write the XSLT. If you wanted to work directly with the OpenDocument file you'd just remove the "TransformToDocBook" stage. And like any XML pipeline you can pass the results of one transformation to the next.

I chose OpenOffice.org rather than a more-serverside component for compatibility reasons. Once WvWare/Abiword support saving to OpenDocument I'll add support for that too (version 2.4.1 of Abiword only supports reading OpenDocument files).

If anyone finds any bugs, or wants to add themes or support for other xml output formats drop me a line :)


.Matthew Cruickshank
http://holloway.co.nz/docvert - Docvert converts MSWord to DocBook to any HTML or XML
*********************************************************
The CMS discussion list for http://webstandardsgroup.org/
*********************************************************

Reply via email to