Paul Noone wrote:
DocVert
It's as good a converter as OpenOffice.org is. What Docvert adds is a
nice webservice interface, and a conversion process that's open and
extensible through XSLT and XML pipelines, which - to answer your
question - can support arbitrary word styles.
The XML pipelines look like this,
<pipeline>
<stage process="TransformToDocBook"/>
<stage process="Transform" withFile="webstyle.xsl"/>
<stage process="Serialize" toFile="index.html"/>
</pipeline>
This pipeline starts with an OpenDocument file, and goes through these
stages resulting in some HTML.
For more complex needs, such as serializing to multiple files (eg,
breaking up html pages on heading1) and making a table of contents, the
pipeline might look like this,
<pipeline>
<stage process="TransformToDocBook"/>
<stage process="Loop" numberOfTimes="xpathCount://sect1">
<stage process="SplitPages"/>
<stage process="Transform" withFile="webstyle.xsl"/>
<stage process="Serialize" toFile="section{LoopIndex}.html"/>
</stage>
<stage process="GetPreface" forSectionLevel="0" splitPagesDepth="1"/>
<stage process="Transform" withFile="webstyle.xsl"/>
<stage process="Serialize" toFile="index.html"/>
</pipeline>
As you see there are some inbuilt abstractions to do with DocBook and
extracting prefaces, so you can just write the XSLT. If you wanted to
work directly with the OpenDocument file you'd just remove the
"TransformToDocBook" stage. And like any XML pipeline you can pass the
results of one transformation to the next.
I chose OpenOffice.org rather than a more-serverside component for
compatibility reasons. Once WvWare/Abiword support saving to
OpenDocument I'll add support for that too (version 2.4.1 of Abiword
only supports reading OpenDocument files).
If anyone finds any bugs, or wants to add themes or support for other
xml output formats drop me a line :)
.Matthew Cruickshank
http://holloway.co.nz/docvert - Docvert converts MSWord to DocBook to
any HTML or XML
*********************************************************
The CMS discussion list for http://webstandardsgroup.org/
*********************************************************