howto howto-html-pdf-publishing.xml

shannon Wed, 03 Jul 2002 13:06:56 -0700

shannon     2002/07/03 13:27:14

  Added:       src/documentation/xdocs/howto howto-html-pdf-publishing.xml
  Log:
  New How-To on publishing
  HTML and PDF docs in Cocoon
  by Betrand Delacretaz
  [EMAIL PROTECTED]
  
  Revision  Changes    Path
  1.1                  
xml-cocoon2/src/documentation/xdocs/howto/howto-html-pdf-publishing.xml
  
  Index: howto-html-pdf-publishing.xml
  ===================================================================
  <?xml version="1.0" encoding="ISO-8859-1"?>
  <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.0//EN" 
"../dtd/document-v10.dtd">
  
  <document>
   <header>
    <title>How to publish XML documents in HTML and PDF</title>
    <authors>
     <person name="Bertrand Delacr&#232;taz" email="[EMAIL PROTECTED]"/>
    </authors>
   </header>
  
   <body>
  
  <s1 title="Overview">
  <p>
  Without requiring any prior knowledge of Cocoon, XSLT or XSL-FO, this How-To shows 
you how to publish XML 
  documents in HTML and PDF using Cocoon.
  <br/>
  The steps below have been tested with Cocoon 2.0.2-dev but should work with any 2.x 
version.  
  </p>
  </s1>
  
  <s1 title="Purpose">
  <p>
  We will build a simple pipeline that converts XML documents into HTML or PDF 
on-the-fly using simple 
  XSLT transforms.
  <br/>
  This is similar to the <em>hello.html</em> and <em>hello.pdf</em> samples of the 
standard Cocoon installation, but here you
  will be building it yourself, which should help you get a better feel of how this 
works. 
  </p>
  </s1>
  
  <s1 title="Intended Audience">
  <p>
  Beginning Cocoon users who want to learn how to publish HTML and/or PDF documents 
from XML data.
  </p>
  </s1>
  
  <s1 title="Prerequisites">
  <p>Here's what you need:</p>  
  
  <ul>
  <li>Cocoon must be running on your system . </li>
  <li>This document assumes a standard installation where
  <link 
href="http://localhost:8080/cocoon/mount/";>http://localhost:8080/cocoon/mount/</link> 
points to 
  the <em>mount</em> subdirectory of the Cocoon installation. Calling this URL should 
display a page
  titled "Directory Listing of mount".
  <br/> 
  If your installation runs on a different URL, you will have to adjust
  the URLs given in this document accordingly. 
  </li>
  <li>You must be able to create and edit XML files in the <em>mount</em> subdirectory 
of the Cocoon installation.
  In a standard installation, this is <em>webapps/cocoon/mount</em> under the 
directory of the tomcat installation. 
  </li>
  </ul>
  <note>You will not need a fancy XML editor for this, copying and pasting the 
examples into any text editor
  will do.</note>
  
  </s1>
  
  <s1 title="Steps">
  <p>
  Here's how to proceed.
  </p>
  
  <s2 title="1. Create the work directory under mount" >
  <p>
  Under <em>webapps/cocoon/mount</em>, create a new directory named <em>html-pdf</em>. 
  All files used by this How-To will reside in this directory.
  <br/>
  After a browser refresh, <link 
href="http://localhost:8080/cocoon/mount/";>http://localhost:8080/cocoon/mount/</link> 
  should display the name of this new directory, among others. 
  </p>
  </s2>
  
  <s2 title="2. Create the XML example documents" >
  <p>
  To keep it simple we will use two small XML files as our data source.
  Later, you will probably use other data sources like live XML feeds, databases, etc. 
  </p>
  <p>
  In the <em>html-pdf</em> directory, create the following two files, naming them 
exactly as
  shown.
  </p>
  
  <note>
  Be careful about lower/uppercase in filenames if you're working on a unix or linux 
system. 
  On such systems, <em>thisFile.xml</em> is not the same as <em>Thisfile.xml</em>.
  </note>
  <note>
  To avoid any errors, use copy/paste when creating XML documents from examples on 
this page.
  <br/>
  Also, do not leave spaces at the start of XML files - the &lt;?xml... processing 
instruction must
  be the first character in the file.
  </note>
  
  <p>
  Contents of file <strong>pageOne.xml</strong>:
  </p>
         <source><![CDATA[
  <?xml version="1.0" encoding="iso-8859-1"?>
  <page>
  <title>This is the pageOne.xml example</title>
  <s1 title="Section one">
      <p>This is the text of section one</p>
  </s1>
  </page>
          ]]></source>
  
  <p>
  Contents of file <strong>pageTwo.xml</strong>:
  </p>
         <source><![CDATA[
  <?xml version="1.0" encoding="iso-8859-1"?>
  <page>
  <title>This is the pageTwo.xml example</title>
  <s1 title="Yes, it works">
      <p>Now you're hopefully seeing pageTwo in HTML or PDF</p>
  </s1>
  </page>
          ]]></source>
  
  </s2>
  
  <s2 title="3. Create the XSLT transform for HTML" >
  <p>
  The most common way of producing HTML in Cocoon is to use <em>XSLT transforms</em> 
to select and convert 
  the appropriate elements of the input documents.
  </p>
  
  <p>
  Copy the file shown below to the <em>html-pdf</em> directory alongside your XML 
documents, naming it
  <strong>doc2html.xsl</strong>
  </p>
  
         <source><![CDATA[
  <?xml version="1.0" encoding="iso-8859-1"?>
  <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; version="1.0">
  
  <!-- generate HTML skeleton on root element -->
  <xsl:template match="/">
    <html>
      <head>
        <title><xsl:apply-templates select="page/title"/></title>
      </head>
      <body>
          <xsl:apply-templates/>
      </body>
    </html>
  </xsl:template>
  
  <!-- story is used later by the Meerkat example -->
  <xsl:template match="p|story">
      <p><xsl:apply-templates/></p>
  </xsl:template>
  
  <!-- convert sections to HTML headings -->
  <xsl:template match="s1">
      <h1><xsl:apply-templates select="@title"/></h1>
      <xsl:apply-templates/>
  </xsl:template>
  
  </xsl:stylesheet>     
  ]]></source>
  <note>       
  Basically what this does is generate an HTML skeleton and convert the input markup 
to HTML. We won't go
  into details here, our goal is just to show you how the components of the publishing 
chain are combined.  
  </note>
  
  </s2>
  
  <s2 title="4. Create the sitemap" >
  <p>
  We now have documents to publish, and an XSLT transform to convert them to our HTML 
output format.
  What's left is to connect these together when a request is made to Cocoon - that's 
the role of the <em>sitemap</em>,
  which will select a <em>processing pipeline</em> based on the request received from 
the browser. 
  </p>
  
  <p>
  To tell Cocoon how we want it to process requests made to <em>html-pdf</em>, 
  copy the following contents to a file named <strong>sitemap.xmap</strong> in the 
  <em>html-pdf</em> subdirectory.
  </p>
  
         <source><![CDATA[
  <?xml version="1.0" encoding="iso-8859-1"?>
  <map:sitemap xmlns:map="http://apache.org/cocoon/sitemap/1.0";>
  
      <!-- use the standard components -->
      <map:components>
          <map:generators default="file"/>
          <map:transformers default="xslt"/>
          <map:readers default="resource"/>
          <map:serializers default="html"/>
          <map:selectors default="browser"/>
          <map:matchers default="wildcard"/>
          <map:transformers default="xslt"/>
      </map:components>
        
      <map:pipelines>
          <map:pipeline>
              <!-- respond to *.html requests with our docs processed by doc2html.xsl 
-->
              <map:match pattern="*.html">
                  <map:generate src="{1}.xml"/>
                  <map:transform src="doc2html.xsl"/>
                  <map:serialize type="html"/>
              </map:match>
              
              <!-- later, respond to *.pdf requests with our docs processed by 
doc2pdf.xsl -->
              <map:match pattern="*.pdf">
                  <map:generate src="{1}.xml"/>
                  <map:transform src="doc2pdf.xsl"/>
                  <map:serialize type="fo2pdf"/>
              </map:match>
          </map:pipeline>
      </map:pipelines>
  </map:sitemap>
          ]]></source>
          
  <note>The important thing here is the first <strong>map:match</strong> element, 
which tells Cocoon how to process
  requests ending in *.html in this directory. Again, we won't go into details here 
but that's where it happens.
  </note>
  <note>The above sitemap is already configured for PDF publishing, but this is not 
usable at this time as we haven't created
  the required XSLT transform yet.</note> 
         
  </s2>
  
  <s2 title="5. Test the HTML publishing" >
  <p>
  At this point you should be able to display the results in HTML: 
  </p>
  <ul>
  <li>
  <link 
href="http://localhost:8080/cocoon/mount/html-pdf/pageOne.html";>http://localhost:8080/cocoon/mount/html-pdf/pageOne.html</link>
  should display the first page with "Section one" in big letters.
  </li>
  <li>
  <link 
href="http://localhost:8080/cocoon/mount/html-pdf/pageTwo.html";>http://localhost:8080/cocoon/mount/html-pdf/pageTwo.html</link>
  should display the second page with "Yes it works" in big letters.
  </li>
  </ul>
  <note>If this doesn't work, you might want to first doublecheck the above steps, and 
then look at the Cocoon
  logs in the webapps/cocoon/WEB-INF/logs directory. You will find lots of information 
there: look for clues 
  in files that change in size when the error happens.
  </note>
  </s2>
  
  
  <s2 title="6. Create the XSLT transform for PDF" >
  <p>
  PDF documents are created via XSL-FO documents, which are XML documents that use a 
specific page-description
  vocabulary (see <link href="#references">References</link> below for more info). The 
actual conversion to PDF is done by the 
  <em>PdfSerializer</em> which uses software from <link 
href="http://xml.apache.org/fop";>FOP</link>, another Apache
  Software Foundation project.   
  </p>
  
  <p>
  To activate the PDF conversion, copy the file shown below to the <em>html-pdf</em> 
directory alongside your XML documents, naming it
  <strong>doc2pdf.xsl</strong>
  </p>
  
         <source><![CDATA[
  <?xml version="1.0" encoding="iso-8859-1"?>
  <xsl:stylesheet 
      xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; version="1.0"
      xmlns:fo="http://www.w3.org/1999/XSL/Format";
  >
      <!-- generate PDF page structure -->
      <xsl:template match="/">
          <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format";>
              <fo:layout-master-set>
                  <fo:simple-page-master master-name="page"
                    page-height="29.7cm" 
                    page-width="21cm"
                    margin-top="1cm" 
                    margin-bottom="2cm" 
                    margin-left="2.5cm" 
                    margin-right="2.5cm"
                  >
                      <fo:region-before extent="3cm"/>
                      <fo:region-body margin-top="3cm"/>
                      <fo:region-after extent="1.5cm"/>
                  </fo:simple-page-master>
  
                  <fo:page-sequence-master master-name="all">
                      <fo:repeatable-page-master-alternatives>
                          <fo:conditional-page-master-reference 
master-reference="page" page-position="first"/>
                      </fo:repeatable-page-master-alternatives>
                  </fo:page-sequence-master>
              </fo:layout-master-set>
  
              <fo:page-sequence master-reference="all">
                  <fo:flow flow-name="xsl-region-body">
                      <fo:block><xsl:apply-templates/></fo:block>
                  </fo:flow>
              </fo:page-sequence>
          </fo:root>
      </xsl:template>
  
      <!-- process paragraphs -->
      <xsl:template match="p">
          <fo:block><xsl:apply-templates/></fo:block>
      </xsl:template>
  
      <!-- convert sections to XSL-FO headings -->
      <xsl:template match="s1">
          <fo:block font-size="24pt" color="red" font-weight="bold">
              <xsl:apply-templates select="@title"/>
          </fo:block>
          <xsl:apply-templates/>
      </xsl:template>
  
  </xsl:stylesheet>
  ]]>
         </source>
  <note>This file is already referenced by the sitemap that we created, so no 
additional configuration is needed.</note>       
  </s2>
  
  <s2 title="5. Test the PDF publishing" >
  <p>
  At this point you should be able to display the results in PDF in addition to the 
existing HTML versions: 
  </p>
  <ul>
  <li>
  <link 
href="http://localhost:8080/cocoon/mount/html-pdf/pageOne.pdf";>http://localhost:8080/cocoon/mount/html-pdf/pageOne.pdf</link>
  should display the first page with "Section one" in big red letters.
  </li>
  <li>
  <link 
href="http://localhost:8080/cocoon/mount/html-pdf/pageTwo.pdf";>http://localhost:8080/cocoon/mount/html-pdf/pageTwo.pdf</link>
  should display the second page with "Yes it works" in big red letters.
  </li>
  </ul>
  </s2>
  
  </s1>
  
  <s1 title="Summary">
  <p>
  Hopefully you're beginning to see that this is not too complicated once you know 
what goes where. 
  <br/>
  The nice thing is that all of our huge corpus
  of XML documents (two documents actually, but that's a start..) is processed by just 
two XSLT files, one
  for each target format.
  <br/> 
  Changing the appearance of the published documents would require changing these XSLT 
transforms only, without
  touching the source documents.
  </p>
  </s1>
  
  <s1 title="Tips">
  <s2 title="Tip 1: Dynamic XML data">
  <p>
  Using dynamic XML as the data source is very easy as the Cocoon FileGenerator can 
read URLs as well. 
  <br/>
  If you add the map:match element shown in bold below <strong>before</strong> the 
existing map:match elements in your sitemap.xmap file, requesting
  <link 
href="http://localhost:8080/cocoon/mount/html-pdf/meerkat.html";>http://localhost:8080/cocoon/mount/html-pdf/meerkat.html</link>
  should display real-time news from Meerkat (assuming an Internet connection to 
Meerkat is available).
  <br/>
  The news will be displayed in a very rough format, but this can be made better by 
writing a 
  specific XSLT transform for this Meerkat data and using it instead of doc2html.xsl 
in the meerkat.html pipeline.  
  </p>
  
  <source>
  <![CDATA[
  ...
  <map:pipeline>
  ]]>
  <strong>
  <![CDATA[
  <map:match pattern="meerkat.html">
      <map:generate src="http://www.oreillynet.com/meerkat/?_fl=xml"/>
      <map:transform src="doc2html.xsl"/>
      <map:serialize type="html"/>
  </map:match>
  ]]>
  </strong>
  <![CDATA[
  <map:match pattern="*.html">
  etc...
  ]]>
  </source>
  </s2>
  
  <s2 title="Tip 2: Two-step conversion">
  <p>
  When you are generating multiple formats from a single data source, it is often a 
good idea to first generate
  an intermediate <em>logical document</em> that describes the output in a 
format-neutral way.
  <br/>
  This is obviously not needed in our simple example, but if you're aiming at more 
complicated 
  publishing tasks you might want to read about this "publishing pattern" in Martin 
Fowler's 
  <link href="http://www.martinfowler.com/isa/htmlRenderer.html";>Two Step View</link>
  article.
  </p>
  </s2>
  
  </s1>
  
  <s1 title="References">
  <anchor id="references"/>
  <p>
  To go further, you will need to learn about the following technologies and tools:
  </p>
  <ul>
  <li>
  Learning about the 
  <link 
href="http://www.google.com/search?as_sitesearch=xml.apache.org&amp;as_q=cocoon+concepts+sitemap";>
  Cocoon concepts</link> will help you understand how the sitemap, generators, 
transformers and serializers work.
  </li> 
  <li>
  Learning about <link href="http://www.w3.org/Style/XSL/";>XSLT</link> will allow you 
to write your own transforms to 
  generate HTML, PDF or other formats from XML data. 
  Information about XSL-FO is available at the same address.  
  </li>
  </ul>
  </s1>
  
  <s1 title="Comments">
  <p>
  Care to comment on this How-To? Got another tip? 
  Help keep this How-To relevant by passing along any useful feedback to the author,
  <link href="mailto:[EMAIL PROTECTED]";>Bertrand&#160;Delacr&#232;taz</link>.
  </p>
  </s1>
  
  </body>
  </document>


----------------------------------------------------------------------
In case of troubles, e-mail:     [EMAIL PROTECTED]
To unsubscribe, e-mail:          [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

cvs commit: xml-cocoon2/src/documentation/xdocs/howto howto-html-pdf-publishing.xml

Reply via email to