Hi Dave

>    I'll  help if you want osme? Might be an idea to use it with
> http://mercury.ccil.org/~cowan/XML/tagsoup/  tagsoup since most
> html isn't all that clever?

Hmmm... Maybe. But I don't see that this is ever going to be a
generalized converter from HTML to OO. I see it as a step in a
specific pipeline requiring quite good HTML. A very adaptable
generalized converter will need mapping support between HTML and OO
and that will be complicated. More complicated than I want anyway.


For example, I am publishing my CV like this. I maintain the CV in
Emacs org-mode. From there I generate an XOXO microformat file which I
then XSLT into well marked up HTML (with DIVs and things).

I can then use html2oo.xslt to tranfer that into OO and from there get
Word or anything else that OO can spit out.


Another example of an application I had in mind is something I built
for Thompson: it built websites out of legal content by converting
their SGML content to XML and then HTML via XSLT. I also had to
convert the XML to Word by using an XSL-FO processor. 

But now I would just have a single HTML design with a CSS providing
the look for the web pages and html2oo.xslt producing Word (via
OpenOffice).


Anyway... I've inlined the stylesheet at the bottom. As I said, it's
not comprehensive yet but as I need more elements I will add them.

Right now, I'm controlling the resulting OO file with a Makefile that
looks like this:

  doc.odt: doc/content.xml
        bash -c 'cd doc ; zip -r ../doc.odt *'


  doc/content.xml: html2oo.xslt doc.html doc
        xsltproc --html html2oo.xslt doc.html > doc/content.xml

  doc:
        [ -d doc ] || ( mkdir doc ; unzip -d doc doc.odt )


There are options for making this better but it kinda depends on what
tools you want to use for the XSLT.


If I setup a darcs (http://abridgegame.org/darcs/) repository for this
would anyone contribute do you think? Would you?


<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet  version="1.0" 
                 xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
                 
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" 
                 xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0" 
                 xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
                 xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0"
                 xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0"
                 
xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0"
                 xmlns:xlink="http://www.w3.org/1999/xlink";
                 xmlns:dc="http://purl.org/dc/elements/1.1/";
                 xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0"
                 
xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0"
                 
xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0"
                 xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0"
                 xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0"
                 xmlns:math="http://www.w3.org/1998/Math/MathML";
                 xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0"
                 xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0"
                 xmlns:ooo="http://openoffice.org/2004/office";
                 xmlns:ooow="http://openoffice.org/2004/writer";
                 xmlns:oooc="http://openoffice.org/2004/calc";
                 xmlns:dom="http://www.w3.org/2001/xml-events";
                 xmlns:xforms="http://www.w3.org/2002/xforms";
                 xmlns:xsd="http://www.w3.org/2001/XMLSchema";
                 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";>

    <!-- 
         Copyright (C) 2006 by Tapsell-Ferrier Limited

         This program is free software; you can redistribute it and/or modify
         it under the terms of the GNU General Public License as published by 
         the Free Software Foundation; either version 2, or (at your option) 
         any later version. 

         This program is distributed in the hope that it will be useful, 
         but WITHOUT ANY WARRANTY; without even the implied warranty of 
         MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the 
         GNU General Public License for more details. 
    
         You should have received a copy of the GNU General Public License 
         along with this program; see the file COPYING.  If not, write to the 
         Free Software Foundation, Inc.,   51 Franklin Street, Fifth Floor, 
         Boston, MA  02110-1301  USA 
      -->

    <xsl:output method="xml" indent="yes"/>

    <xsl:template match="/html">
        <office:document-content office:version="1.0">
            <office:scripts/>
            <office:font-face-decls>
                <style:font-face style:name="StarSymbol" 
svg:font-family="StarSymbol" style:font-charset="x-symbol"/>
                <style:font-face style:name="DejaVu Sans1" 
svg:font-family="'DejaVu Sans'" style:font-pitch="variable"/>
                <style:font-face style:name="DejaVu Serif" 
svg:font-family="'DejaVu Serif'" style:font-family-generic="roman" 
style:font-pitch="variable"/>
                <style:font-face style:name="DejaVu Sans" 
svg:font-family="'DejaVu Sans'" style:font-family-generic="swiss" 
style:font-pitch="variable"/>
            </office:font-face-decls>
            <office:automatic-styles>
                <style:style style:name="Table1" style:family="table">
                    <style:table-properties style:width="6.925in" 
table:align="margins"/>
                </style:style>
                <style:style style:name="Table1.A" style:family="table-column">
                    <style:table-column-properties 
style:column-width="3.4625in" style:rel-column-width="32767*"/>
                </style:style>
                <style:style style:name="Table1.A1" style:family="table-cell">
                    <style:table-cell-properties fo:padding="0.0382in" 
fo:border-left="0.0007in solid #000000" fo:border-right="none" 
fo:border-top="0.0007in solid #000000" fo:border-bottom="0.0007in solid 
#000000"/>
                </style:style>
                <style:style style:name="Table1.B1" style:family="table-cell">
                    <style:table-cell-properties fo:padding="0.0382in" 
fo:border="0.0007in solid #000000"/>
                </style:style>
                <style:style style:name="Table1.A2" style:family="table-cell">
                    <style:table-cell-properties fo:padding="0.0382in" 
fo:border-left="0.0007in solid #000000" fo:border-right="none" 
fo:border-top="none" fo:border-bottom="0.0007in solid #000000"/>
                </style:style>
                <style:style style:name="Table1.B2" style:family="table-cell">
                    <style:table-cell-properties fo:padding="0.0382in" 
fo:border-left="0.0007in solid #000000" fo:border-right="0.0007in solid 
#000000" fo:border-top="none" fo:border-bottom="0.0007in solid #000000"/>
                </style:style>
                <style:style style:name="Table2" style:family="table">
                    <style:table-properties style:width="6.925in" 
table:align="margins"/>
                </style:style>
                <style:style style:name="Table2.A" style:family="table-column">
                    <style:table-column-properties 
style:column-width="3.4625in" style:rel-column-width="32767*"/>
                </style:style>
                <style:style style:name="Table2.A1" style:family="table-cell">
                    <style:table-cell-properties fo:padding="0.0382in" 
fo:border-left="0.0007in solid #000000" fo:border-right="none" 
fo:border-top="0.0007in solid #000000" fo:border-bottom="0.0007in solid 
#000000"/>
                </style:style>
                <style:style style:name="Table2.B1" style:family="table-cell">
                    <style:table-cell-properties fo:padding="0.0382in" 
fo:border="0.0007in solid #000000"/>
                </style:style>
                <style:style style:name="Table2.A2" style:family="table-cell">
                    <style:table-cell-properties fo:padding="0.0382in" 
fo:border-left="0.0007in solid #000000" fo:border-right="none" 
fo:border-top="none" fo:border-bottom="0.0007in solid #000000"/>
                </style:style>
                <style:style style:name="Table2.B2" style:family="table-cell">
                    <style:table-cell-properties fo:padding="0.0382in" 
fo:border-left="0.0007in solid #000000" fo:border-right="0.0007in solid 
#000000" fo:border-top="none" fo:border-bottom="0.0007in solid #000000"/>
                </style:style>
                <style:style style:name="P1" style:family="paragraph" 
style:parent-style-name="Table_20_Heading">
                    <style:paragraph-properties fo:text-align="start" 
style:justify-single-word="false"/>
                    <style:text-properties fo:font-style="normal" 
fo:font-weight="normal" style:font-style-asian="normal" 
style:font-weight-asian="normal" style:font-style-complex="normal" 
style:font-weight-complex="normal"/>
                </style:style>
                <style:style style:name="P2" style:family="paragraph" 
style:parent-style-name="Standard" style:list-style-name="L1"/>
                <style:style style:name="P3" style:family="paragraph" 
style:parent-style-name="Standard" style:list-style-name="L2"/>
                <text:list-style style:name="L1">
                    <text:list-level-style-bullet text:level="1" 
text:style-name="Bullet_20_Symbols" style:num-suffix="." 
text:bullet-char="&#x25CF;">
                        <style:list-level-properties text:space-before="0.25in" 
text:min-label-width="0.25in"/>
                        <style:text-properties style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet text:level="2" 
text:style-name="Bullet_20_Symbols" style:num-suffix="." 
text:bullet-char="&#x25CB;">
                        <style:list-level-properties text:space-before="0.5in" 
text:min-label-width="0.25in"/>
                        <style:text-properties style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet text:level="3" 
text:style-name="Bullet_20_Symbols" style:num-suffix="." 
text:bullet-char="&#x25A0;">
                        <style:list-level-properties text:space-before="0.75in" 
text:min-label-width="0.25in"/>
                        <style:text-properties style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet text:level="4" 
text:style-name="Bullet_20_Symbols" style:num-suffix="." 
text:bullet-char="&#x25CF;">
                        <style:list-level-properties text:space-before="1in" 
text:min-label-width="0.25in"/>
                        <style:text-properties style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet text:level="5" 
text:style-name="Bullet_20_Symbols" style:num-suffix="." 
text:bullet-char="&#x25CB;">
                        <style:list-level-properties text:space-before="1.25in" 
text:min-label-width="0.25in"/>
                        <style:text-properties style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet text:level="6" 
text:style-name="Bullet_20_Symbols" style:num-suffix="." 
text:bullet-char="&#x25A0;">
                        <style:list-level-properties text:space-before="1.5in" 
text:min-label-width="0.25in"/>
                        <style:text-properties style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet text:level="7" 
text:style-name="Bullet_20_Symbols" style:num-suffix="." 
text:bullet-char="&#x25CF;">
                        <style:list-level-properties text:space-before="1.75in" 
text:min-label-width="0.25in"/>
                        <style:text-properties style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet text:level="8" 
text:style-name="Bullet_20_Symbols" style:num-suffix="." 
text:bullet-char="&#x25CB;">
                        <style:list-level-properties text:space-before="2in" 
text:min-label-width="0.25in"/>
                        <style:text-properties style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet text:level="9" 
text:style-name="Bullet_20_Symbols" style:num-suffix="." 
text:bullet-char="&#x25A0;">
                        <style:list-level-properties text:space-before="2.25in" 
text:min-label-width="0.25in"/>
                        <style:text-properties style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet text:level="10" 
text:style-name="Bullet_20_Symbols" style:num-suffix="." 
text:bullet-char="&#x25CF;">
                        <style:list-level-properties text:space-before="2.5in" 
text:min-label-width="0.25in"/>
                        <style:text-properties style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                </text:list-style>
                <text:list-style style:name="L2">
                    <text:list-level-style-bullet text:level="1" 
text:style-name="Bullet_20_Symbols" style:num-suffix="." 
text:bullet-char="&#x25CF;">
                        <style:list-level-properties text:space-before="0.25in" 
text:min-label-width="0.25in"/>
                        <style:text-properties style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet text:level="2" 
text:style-name="Bullet_20_Symbols" style:num-suffix="." 
text:bullet-char="&#x25CB;">
                        <style:list-level-properties text:space-before="0.5in" 
text:min-label-width="0.25in"/>
                        <style:text-properties style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet text:level="3" 
text:style-name="Bullet_20_Symbols" style:num-suffix="." 
text:bullet-char="&#x25A0;">
                        <style:list-level-properties text:space-before="0.75in" 
text:min-label-width="0.25in"/>
                        <style:text-properties style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet text:level="4" 
text:style-name="Bullet_20_Symbols" style:num-suffix="." 
text:bullet-char="&#x25CF;">
                        <style:list-level-properties text:space-before="1in" 
text:min-label-width="0.25in"/>
                        <style:text-properties style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet text:level="5" 
text:style-name="Bullet_20_Symbols" style:num-suffix="." 
text:bullet-char="&#x25CB;">
                        <style:list-level-properties text:space-before="1.25in" 
text:min-label-width="0.25in"/>
                        <style:text-properties style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet text:level="6" 
text:style-name="Bullet_20_Symbols" style:num-suffix="." 
text:bullet-char="&#x25A0;">
                        <style:list-level-properties text:space-before="1.5in" 
text:min-label-width="0.25in"/>
                        <style:text-properties style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet text:level="7" 
text:style-name="Bullet_20_Symbols" style:num-suffix="." 
text:bullet-char="&#x25CF;">
                        <style:list-level-properties text:space-before="1.75in" 
text:min-label-width="0.25in"/>
                        <style:text-properties style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet text:level="8" 
text:style-name="Bullet_20_Symbols" style:num-suffix="." 
text:bullet-char="&#x25CB;">
                        <style:list-level-properties text:space-before="2in" 
text:min-label-width="0.25in"/>
                        <style:text-properties style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet text:level="9" 
text:style-name="Bullet_20_Symbols" style:num-suffix="." 
text:bullet-char="&#x25A0;">
                        <style:list-level-properties text:space-before="2.25in" 
text:min-label-width="0.25in"/>
                        <style:text-properties style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet text:level="10" 
text:style-name="Bullet_20_Symbols" style:num-suffix="." 
text:bullet-char="&#x25CF;">
                        <style:list-level-properties text:space-before="2.5in" 
text:min-label-width="0.25in"/>
                        <style:text-properties style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                </text:list-style>
            </office:automatic-styles>
            <xsl:apply-templates select="body"/>
        </office:document-content>
    </xsl:template>

    <xsl:template match="body">
        <office:body>
            <office:text>
                <office:forms form:automatic-focus="false" 
form:apply-design-mode="false"/>
                <text:sequence-decls>
                    <text:sequence-decl text:display-outline-level="0" 
text:name="Illustration"/>
                    <text:sequence-decl text:display-outline-level="0" 
text:name="Table"/>
                    <text:sequence-decl text:display-outline-level="0" 
text:name="Text"/>
                    <text:sequence-decl text:display-outline-level="0" 
text:name="Drawing"/>
                </text:sequence-decls>
                <xsl:apply-templates select="node()"/>
            </office:text>
        </office:body>
    </xsl:template>

    <xsl:template match="h1">
        <text:h text:style-name="Heading_20_1"><xsl:apply-templates 
select="node()"/></text:h>
    </xsl:template>

    <xsl:template match="h2">
        <text:h text:style-name="Heading_20_2"><xsl:apply-templates 
select="node()"/></text:h>
    </xsl:template>

      <xsl:template match="h3">
          <text:h text:style-name="Heading_20_3"><xsl:apply-templates 
select="node()"/></text:h>
      </xsl:template>

      <xsl:template match="h4">
          <text:h text:style-name="Heading_20_4"><xsl:apply-templates 
select="node()"/></text:h>
      </xsl:template>


      <xsl:template match="p">
          <text:p text:style-name="Standard"><xsl:apply-templates 
select="node()"/></text:p>
      </xsl:template>


      <xsl:template match="table">
          <table:table table:name="Table1" table:style-name="Table1">
              <table:table-column table:style-name="Table1.A" 
table:number-columns-repeated="2"/>
              <!-- FIXME: should not do this... 
                   instead simply apply on node() and have template matches for 
tr[th] -->
              <xsl:for-each select="tr[th]">
                  <table:table-header-rows>
                      <table:table-row>
                          <xsl:apply-templates select="th|td"/>
                      </table:table-row>
                  </table:table-header-rows>                
              </xsl:for-each>
              <xsl:for-each select="tr[td]">
                  <table:table-row>
                      <xsl:apply-templates select="td"/>
                  </table:table-row>
              </xsl:for-each>
          </table:table>
      </xsl:template>

      <xsl:template match="th|td">
          <table:table-cell table:style-name="Table1.A1" 
office:value-type="string">
              <xsl:call-template name="text_applyer"/>
          </table:table-cell>        
      </xsl:template>

      <xsl:template match="ul">
          <text:list text:style-name="L1">
              <!-- FIXME: should not do this... 
                   instead simply apply on node() and have template matches for 
li -->
              <xsl:for-each select="li">
                  <text:list-item><xsl:call-template 
name="text_applyer"/></text:list-item>
              </xsl:for-each>
          </text:list>
      </xsl:template>

      <xsl:template name="text_applyer">
          <xsl:choose>
              <xsl:when test="text()"><text:p 
text:style-name="Standard"><xsl:value-of select="."/></text:p>
              </xsl:when>
              <xsl:otherwise><xsl:apply-templates 
select="node()"/></xsl:otherwise>
          </xsl:choose>
      </xsl:template>

      <xsl:template match="p">
          <text:p text:style-name="Standard"><xsl:apply-templates 
select="node()"/></text:p>
          <text:p text:style-name="Standard"></text:p>
      </xsl:template>

  </xsl:stylesheet>


-- 
Nic Ferrier
http://www.tapsellferrier.co.uk   for all your tapsell ferrier needs

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to