Hi Dave
> I'll help if you want osme? Might be an idea to use it with
> http://mercury.ccil.org/~cowan/XML/tagsoup/ tagsoup since most
> html isn't all that clever?
Hmmm... Maybe. But I don't see that this is ever going to be a
generalized converter from HTML to OO. I see it as a step in a
specific pipeline requiring quite good HTML. A very adaptable
generalized converter will need mapping support between HTML and OO
and that will be complicated. More complicated than I want anyway.
For example, I am publishing my CV like this. I maintain the CV in
Emacs org-mode. From there I generate an XOXO microformat file which I
then XSLT into well marked up HTML (with DIVs and things).
I can then use html2oo.xslt to tranfer that into OO and from there get
Word or anything else that OO can spit out.
Another example of an application I had in mind is something I built
for Thompson: it built websites out of legal content by converting
their SGML content to XML and then HTML via XSLT. I also had to
convert the XML to Word by using an XSL-FO processor.
But now I would just have a single HTML design with a CSS providing
the look for the web pages and html2oo.xslt producing Word (via
OpenOffice).
Anyway... I've inlined the stylesheet at the bottom. As I said, it's
not comprehensive yet but as I need more elements I will add them.
Right now, I'm controlling the resulting OO file with a Makefile that
looks like this:
doc.odt: doc/content.xml
bash -c 'cd doc ; zip -r ../doc.odt *'
doc/content.xml: html2oo.xslt doc.html doc
xsltproc --html html2oo.xslt doc.html > doc/content.xml
doc:
[ -d doc ] || ( mkdir doc ; unzip -d doc doc.odt )
There are options for making this better but it kinda depends on what
tools you want to use for the XSLT.
If I setup a darcs (http://abridgegame.org/darcs/) repository for this
would anyone contribute do you think? Would you?
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0"
xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0"
xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0"
xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0"
xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0"
xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0"
xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0"
xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0"
xmlns:math="http://www.w3.org/1998/Math/MathML"
xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0"
xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0"
xmlns:ooo="http://openoffice.org/2004/office"
xmlns:ooow="http://openoffice.org/2004/writer"
xmlns:oooc="http://openoffice.org/2004/calc"
xmlns:dom="http://www.w3.org/2001/xml-events"
xmlns:xforms="http://www.w3.org/2002/xforms"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<!--
Copyright (C) 2006 by Tapsell-Ferrier Limited
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2, or (at your option)
any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; see the file COPYING. If not, write to the
Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
Boston, MA 02110-1301 USA
-->
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/html">
<office:document-content office:version="1.0">
<office:scripts/>
<office:font-face-decls>
<style:font-face style:name="StarSymbol"
svg:font-family="StarSymbol" style:font-charset="x-symbol"/>
<style:font-face style:name="DejaVu Sans1"
svg:font-family="'DejaVu Sans'" style:font-pitch="variable"/>
<style:font-face style:name="DejaVu Serif"
svg:font-family="'DejaVu Serif'" style:font-family-generic="roman"
style:font-pitch="variable"/>
<style:font-face style:name="DejaVu Sans"
svg:font-family="'DejaVu Sans'" style:font-family-generic="swiss"
style:font-pitch="variable"/>
</office:font-face-decls>
<office:automatic-styles>
<style:style style:name="Table1" style:family="table">
<style:table-properties style:width="6.925in"
table:align="margins"/>
</style:style>
<style:style style:name="Table1.A" style:family="table-column">
<style:table-column-properties
style:column-width="3.4625in" style:rel-column-width="32767*"/>
</style:style>
<style:style style:name="Table1.A1" style:family="table-cell">
<style:table-cell-properties fo:padding="0.0382in"
fo:border-left="0.0007in solid #000000" fo:border-right="none"
fo:border-top="0.0007in solid #000000" fo:border-bottom="0.0007in solid
#000000"/>
</style:style>
<style:style style:name="Table1.B1" style:family="table-cell">
<style:table-cell-properties fo:padding="0.0382in"
fo:border="0.0007in solid #000000"/>
</style:style>
<style:style style:name="Table1.A2" style:family="table-cell">
<style:table-cell-properties fo:padding="0.0382in"
fo:border-left="0.0007in solid #000000" fo:border-right="none"
fo:border-top="none" fo:border-bottom="0.0007in solid #000000"/>
</style:style>
<style:style style:name="Table1.B2" style:family="table-cell">
<style:table-cell-properties fo:padding="0.0382in"
fo:border-left="0.0007in solid #000000" fo:border-right="0.0007in solid
#000000" fo:border-top="none" fo:border-bottom="0.0007in solid #000000"/>
</style:style>
<style:style style:name="Table2" style:family="table">
<style:table-properties style:width="6.925in"
table:align="margins"/>
</style:style>
<style:style style:name="Table2.A" style:family="table-column">
<style:table-column-properties
style:column-width="3.4625in" style:rel-column-width="32767*"/>
</style:style>
<style:style style:name="Table2.A1" style:family="table-cell">
<style:table-cell-properties fo:padding="0.0382in"
fo:border-left="0.0007in solid #000000" fo:border-right="none"
fo:border-top="0.0007in solid #000000" fo:border-bottom="0.0007in solid
#000000"/>
</style:style>
<style:style style:name="Table2.B1" style:family="table-cell">
<style:table-cell-properties fo:padding="0.0382in"
fo:border="0.0007in solid #000000"/>
</style:style>
<style:style style:name="Table2.A2" style:family="table-cell">
<style:table-cell-properties fo:padding="0.0382in"
fo:border-left="0.0007in solid #000000" fo:border-right="none"
fo:border-top="none" fo:border-bottom="0.0007in solid #000000"/>
</style:style>
<style:style style:name="Table2.B2" style:family="table-cell">
<style:table-cell-properties fo:padding="0.0382in"
fo:border-left="0.0007in solid #000000" fo:border-right="0.0007in solid
#000000" fo:border-top="none" fo:border-bottom="0.0007in solid #000000"/>
</style:style>
<style:style style:name="P1" style:family="paragraph"
style:parent-style-name="Table_20_Heading">
<style:paragraph-properties fo:text-align="start"
style:justify-single-word="false"/>
<style:text-properties fo:font-style="normal"
fo:font-weight="normal" style:font-style-asian="normal"
style:font-weight-asian="normal" style:font-style-complex="normal"
style:font-weight-complex="normal"/>
</style:style>
<style:style style:name="P2" style:family="paragraph"
style:parent-style-name="Standard" style:list-style-name="L1"/>
<style:style style:name="P3" style:family="paragraph"
style:parent-style-name="Standard" style:list-style-name="L2"/>
<text:list-style style:name="L1">
<text:list-level-style-bullet text:level="1"
text:style-name="Bullet_20_Symbols" style:num-suffix="."
text:bullet-char="●">
<style:list-level-properties text:space-before="0.25in"
text:min-label-width="0.25in"/>
<style:text-properties style:font-name="StarSymbol"/>
</text:list-level-style-bullet>
<text:list-level-style-bullet text:level="2"
text:style-name="Bullet_20_Symbols" style:num-suffix="."
text:bullet-char="○">
<style:list-level-properties text:space-before="0.5in"
text:min-label-width="0.25in"/>
<style:text-properties style:font-name="StarSymbol"/>
</text:list-level-style-bullet>
<text:list-level-style-bullet text:level="3"
text:style-name="Bullet_20_Symbols" style:num-suffix="."
text:bullet-char="■">
<style:list-level-properties text:space-before="0.75in"
text:min-label-width="0.25in"/>
<style:text-properties style:font-name="StarSymbol"/>
</text:list-level-style-bullet>
<text:list-level-style-bullet text:level="4"
text:style-name="Bullet_20_Symbols" style:num-suffix="."
text:bullet-char="●">
<style:list-level-properties text:space-before="1in"
text:min-label-width="0.25in"/>
<style:text-properties style:font-name="StarSymbol"/>
</text:list-level-style-bullet>
<text:list-level-style-bullet text:level="5"
text:style-name="Bullet_20_Symbols" style:num-suffix="."
text:bullet-char="○">
<style:list-level-properties text:space-before="1.25in"
text:min-label-width="0.25in"/>
<style:text-properties style:font-name="StarSymbol"/>
</text:list-level-style-bullet>
<text:list-level-style-bullet text:level="6"
text:style-name="Bullet_20_Symbols" style:num-suffix="."
text:bullet-char="■">
<style:list-level-properties text:space-before="1.5in"
text:min-label-width="0.25in"/>
<style:text-properties style:font-name="StarSymbol"/>
</text:list-level-style-bullet>
<text:list-level-style-bullet text:level="7"
text:style-name="Bullet_20_Symbols" style:num-suffix="."
text:bullet-char="●">
<style:list-level-properties text:space-before="1.75in"
text:min-label-width="0.25in"/>
<style:text-properties style:font-name="StarSymbol"/>
</text:list-level-style-bullet>
<text:list-level-style-bullet text:level="8"
text:style-name="Bullet_20_Symbols" style:num-suffix="."
text:bullet-char="○">
<style:list-level-properties text:space-before="2in"
text:min-label-width="0.25in"/>
<style:text-properties style:font-name="StarSymbol"/>
</text:list-level-style-bullet>
<text:list-level-style-bullet text:level="9"
text:style-name="Bullet_20_Symbols" style:num-suffix="."
text:bullet-char="■">
<style:list-level-properties text:space-before="2.25in"
text:min-label-width="0.25in"/>
<style:text-properties style:font-name="StarSymbol"/>
</text:list-level-style-bullet>
<text:list-level-style-bullet text:level="10"
text:style-name="Bullet_20_Symbols" style:num-suffix="."
text:bullet-char="●">
<style:list-level-properties text:space-before="2.5in"
text:min-label-width="0.25in"/>
<style:text-properties style:font-name="StarSymbol"/>
</text:list-level-style-bullet>
</text:list-style>
<text:list-style style:name="L2">
<text:list-level-style-bullet text:level="1"
text:style-name="Bullet_20_Symbols" style:num-suffix="."
text:bullet-char="●">
<style:list-level-properties text:space-before="0.25in"
text:min-label-width="0.25in"/>
<style:text-properties style:font-name="StarSymbol"/>
</text:list-level-style-bullet>
<text:list-level-style-bullet text:level="2"
text:style-name="Bullet_20_Symbols" style:num-suffix="."
text:bullet-char="○">
<style:list-level-properties text:space-before="0.5in"
text:min-label-width="0.25in"/>
<style:text-properties style:font-name="StarSymbol"/>
</text:list-level-style-bullet>
<text:list-level-style-bullet text:level="3"
text:style-name="Bullet_20_Symbols" style:num-suffix="."
text:bullet-char="■">
<style:list-level-properties text:space-before="0.75in"
text:min-label-width="0.25in"/>
<style:text-properties style:font-name="StarSymbol"/>
</text:list-level-style-bullet>
<text:list-level-style-bullet text:level="4"
text:style-name="Bullet_20_Symbols" style:num-suffix="."
text:bullet-char="●">
<style:list-level-properties text:space-before="1in"
text:min-label-width="0.25in"/>
<style:text-properties style:font-name="StarSymbol"/>
</text:list-level-style-bullet>
<text:list-level-style-bullet text:level="5"
text:style-name="Bullet_20_Symbols" style:num-suffix="."
text:bullet-char="○">
<style:list-level-properties text:space-before="1.25in"
text:min-label-width="0.25in"/>
<style:text-properties style:font-name="StarSymbol"/>
</text:list-level-style-bullet>
<text:list-level-style-bullet text:level="6"
text:style-name="Bullet_20_Symbols" style:num-suffix="."
text:bullet-char="■">
<style:list-level-properties text:space-before="1.5in"
text:min-label-width="0.25in"/>
<style:text-properties style:font-name="StarSymbol"/>
</text:list-level-style-bullet>
<text:list-level-style-bullet text:level="7"
text:style-name="Bullet_20_Symbols" style:num-suffix="."
text:bullet-char="●">
<style:list-level-properties text:space-before="1.75in"
text:min-label-width="0.25in"/>
<style:text-properties style:font-name="StarSymbol"/>
</text:list-level-style-bullet>
<text:list-level-style-bullet text:level="8"
text:style-name="Bullet_20_Symbols" style:num-suffix="."
text:bullet-char="○">
<style:list-level-properties text:space-before="2in"
text:min-label-width="0.25in"/>
<style:text-properties style:font-name="StarSymbol"/>
</text:list-level-style-bullet>
<text:list-level-style-bullet text:level="9"
text:style-name="Bullet_20_Symbols" style:num-suffix="."
text:bullet-char="■">
<style:list-level-properties text:space-before="2.25in"
text:min-label-width="0.25in"/>
<style:text-properties style:font-name="StarSymbol"/>
</text:list-level-style-bullet>
<text:list-level-style-bullet text:level="10"
text:style-name="Bullet_20_Symbols" style:num-suffix="."
text:bullet-char="●">
<style:list-level-properties text:space-before="2.5in"
text:min-label-width="0.25in"/>
<style:text-properties style:font-name="StarSymbol"/>
</text:list-level-style-bullet>
</text:list-style>
</office:automatic-styles>
<xsl:apply-templates select="body"/>
</office:document-content>
</xsl:template>
<xsl:template match="body">
<office:body>
<office:text>
<office:forms form:automatic-focus="false"
form:apply-design-mode="false"/>
<text:sequence-decls>
<text:sequence-decl text:display-outline-level="0"
text:name="Illustration"/>
<text:sequence-decl text:display-outline-level="0"
text:name="Table"/>
<text:sequence-decl text:display-outline-level="0"
text:name="Text"/>
<text:sequence-decl text:display-outline-level="0"
text:name="Drawing"/>
</text:sequence-decls>
<xsl:apply-templates select="node()"/>
</office:text>
</office:body>
</xsl:template>
<xsl:template match="h1">
<text:h text:style-name="Heading_20_1"><xsl:apply-templates
select="node()"/></text:h>
</xsl:template>
<xsl:template match="h2">
<text:h text:style-name="Heading_20_2"><xsl:apply-templates
select="node()"/></text:h>
</xsl:template>
<xsl:template match="h3">
<text:h text:style-name="Heading_20_3"><xsl:apply-templates
select="node()"/></text:h>
</xsl:template>
<xsl:template match="h4">
<text:h text:style-name="Heading_20_4"><xsl:apply-templates
select="node()"/></text:h>
</xsl:template>
<xsl:template match="p">
<text:p text:style-name="Standard"><xsl:apply-templates
select="node()"/></text:p>
</xsl:template>
<xsl:template match="table">
<table:table table:name="Table1" table:style-name="Table1">
<table:table-column table:style-name="Table1.A"
table:number-columns-repeated="2"/>
<!-- FIXME: should not do this...
instead simply apply on node() and have template matches for
tr[th] -->
<xsl:for-each select="tr[th]">
<table:table-header-rows>
<table:table-row>
<xsl:apply-templates select="th|td"/>
</table:table-row>
</table:table-header-rows>
</xsl:for-each>
<xsl:for-each select="tr[td]">
<table:table-row>
<xsl:apply-templates select="td"/>
</table:table-row>
</xsl:for-each>
</table:table>
</xsl:template>
<xsl:template match="th|td">
<table:table-cell table:style-name="Table1.A1"
office:value-type="string">
<xsl:call-template name="text_applyer"/>
</table:table-cell>
</xsl:template>
<xsl:template match="ul">
<text:list text:style-name="L1">
<!-- FIXME: should not do this...
instead simply apply on node() and have template matches for
li -->
<xsl:for-each select="li">
<text:list-item><xsl:call-template
name="text_applyer"/></text:list-item>
</xsl:for-each>
</text:list>
</xsl:template>
<xsl:template name="text_applyer">
<xsl:choose>
<xsl:when test="text()"><text:p
text:style-name="Standard"><xsl:value-of select="."/></text:p>
</xsl:when>
<xsl:otherwise><xsl:apply-templates
select="node()"/></xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="p">
<text:p text:style-name="Standard"><xsl:apply-templates
select="node()"/></text:p>
<text:p text:style-name="Standard"></text:p>
</xsl:template>
</xsl:stylesheet>
--
Nic Ferrier
http://www.tapsellferrier.co.uk for all your tapsell ferrier needs
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]