shannon     2002/07/03 13:27:14

  Added:       src/documentation/xdocs/howto howto-html-pdf-publishing.xml
  Log:
  New How-To on publishing
  HTML and PDF docs in Cocoon
  by Betrand Delacretaz
  [EMAIL PROTECTED]
  
  Revision  Changes    Path
  1.1                  
xml-cocoon2/src/documentation/xdocs/howto/howto-html-pdf-publishing.xml
  
  Index: howto-html-pdf-publishing.xml
  ===================================================================
  <?xml version="1.0" encoding="ISO-8859-1"?>
  <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.0//EN" 
"../dtd/document-v10.dtd">
  
  <document>
   <header>
    <title>How to publish XML documents in HTML and PDF</title>
    <authors>
     <person name="Bertrand Delacr&#232;taz" email="[EMAIL PROTECTED]"/>
    </authors>
   </header>
  
   <body>
  
  <s1 title="Overview">
  <p>
  Without requiring any prior knowledge of Cocoon, XSLT or XSL-FO, this How-To 
shows you how to publish XML 
  documents in HTML and PDF using Cocoon.
  <br/>
  The steps below have been tested with Cocoon 2.0.2-dev but should work with 
any 2.x version.  
  </p>
  </s1>
  
  <s1 title="Purpose">
  <p>
  We will build a simple pipeline that converts XML documents into HTML or PDF 
on-the-fly using simple 
  XSLT transforms.
  <br/>
  This is similar to the <em>hello.html</em> and <em>hello.pdf</em> samples of 
the standard Cocoon installation, but here you
  will be building it yourself, which should help you get a better feel of how 
this works. 
  </p>
  </s1>
  
  <s1 title="Intended Audience">
  <p>
  Beginning Cocoon users who want to learn how to publish HTML and/or PDF 
documents from XML data.
  </p>
  </s1>
  
  <s1 title="Prerequisites">
  <p>Here's what you need:</p>  
  
  <ul>
  <li>Cocoon must be running on your system . </li>
  <li>This document assumes a standard installation where
  <link 
href="http://localhost:8080/cocoon/mount/";>http://localhost:8080/cocoon/mount/</link>
 points to 
  the <em>mount</em> subdirectory of the Cocoon installation. Calling this URL 
should display a page
  titled "Directory Listing of mount".
  <br/> 
  If your installation runs on a different URL, you will have to adjust
  the URLs given in this document accordingly. 
  </li>
  <li>You must be able to create and edit XML files in the <em>mount</em> 
subdirectory of the Cocoon installation.
  In a standard installation, this is <em>webapps/cocoon/mount</em> under the 
directory of the tomcat installation. 
  </li>
  </ul>
  <note>You will not need a fancy XML editor for this, copying and pasting the 
examples into any text editor
  will do.</note>
  
  </s1>
  
  <s1 title="Steps">
  <p>
  Here's how to proceed.
  </p>
  
  <s2 title="1. Create the work directory under mount" >
  <p>
  Under <em>webapps/cocoon/mount</em>, create a new directory named 
<em>html-pdf</em>. 
  All files used by this How-To will reside in this directory.
  <br/>
  After a browser refresh, <link 
href="http://localhost:8080/cocoon/mount/";>http://localhost:8080/cocoon/mount/</link>
 
  should display the name of this new directory, among others. 
  </p>
  </s2>
  
  <s2 title="2. Create the XML example documents" >
  <p>
  To keep it simple we will use two small XML files as our data source.
  Later, you will probably use other data sources like live XML feeds, 
databases, etc. 
  </p>
  <p>
  In the <em>html-pdf</em> directory, create the following two files, naming 
them exactly as
  shown.
  </p>
  
  <note>
  Be careful about lower/uppercase in filenames if you're working on a unix or 
linux system. 
  On such systems, <em>thisFile.xml</em> is not the same as 
<em>Thisfile.xml</em>.
  </note>
  <note>
  To avoid any errors, use copy/paste when creating XML documents from examples 
on this page.
  <br/>
  Also, do not leave spaces at the start of XML files - the &lt;?xml... 
processing instruction must
  be the first character in the file.
  </note>
  
  <p>
  Contents of file <strong>pageOne.xml</strong>:
  </p>
         <source><![CDATA[
  <?xml version="1.0" encoding="iso-8859-1"?>
  <page>
  <title>This is the pageOne.xml example</title>
  <s1 title="Section one">
      <p>This is the text of section one</p>
  </s1>
  </page>
          ]]></source>
  
  <p>
  Contents of file <strong>pageTwo.xml</strong>:
  </p>
         <source><![CDATA[
  <?xml version="1.0" encoding="iso-8859-1"?>
  <page>
  <title>This is the pageTwo.xml example</title>
  <s1 title="Yes, it works">
      <p>Now you're hopefully seeing pageTwo in HTML or PDF</p>
  </s1>
  </page>
          ]]></source>
  
  </s2>
  
  <s2 title="3. Create the XSLT transform for HTML" >
  <p>
  The most common way of producing HTML in Cocoon is to use <em>XSLT 
transforms</em> to select and convert 
  the appropriate elements of the input documents.
  </p>
  
  <p>
  Copy the file shown below to the <em>html-pdf</em> directory alongside your 
XML documents, naming it
  <strong>doc2html.xsl</strong>
  </p>
  
         <source><![CDATA[
  <?xml version="1.0" encoding="iso-8859-1"?>
  <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; 
version="1.0">
  
  <!-- generate HTML skeleton on root element -->
  <xsl:template match="/">
    <html>
      <head>
        <title><xsl:apply-templates select="page/title"/></title>
      </head>
      <body>
          <xsl:apply-templates/>
      </body>
    </html>
  </xsl:template>
  
  <!-- story is used later by the Meerkat example -->
  <xsl:template match="p|story">
      <p><xsl:apply-templates/></p>
  </xsl:template>
  
  <!-- convert sections to HTML headings -->
  <xsl:template match="s1">
      <h1><xsl:apply-templates select="@title"/></h1>
      <xsl:apply-templates/>
  </xsl:template>
  
  </xsl:stylesheet>     
  ]]></source>
  <note>       
  Basically what this does is generate an HTML skeleton and convert the input 
markup to HTML. We won't go
  into details here, our goal is just to show you how the components of the 
publishing chain are combined.  
  </note>
  
  </s2>
  
  <s2 title="4. Create the sitemap" >
  <p>
  We now have documents to publish, and an XSLT transform to convert them to 
our HTML output format.
  What's left is to connect these together when a request is made to Cocoon - 
that's the role of the <em>sitemap</em>,
  which will select a <em>processing pipeline</em> based on the request 
received from the browser. 
  </p>
  
  <p>
  To tell Cocoon how we want it to process requests made to <em>html-pdf</em>, 
  copy the following contents to a file named <strong>sitemap.xmap</strong> in 
the 
  <em>html-pdf</em> subdirectory.
  </p>
  
         <source><![CDATA[
  <?xml version="1.0" encoding="iso-8859-1"?>
  <map:sitemap xmlns:map="http://apache.org/cocoon/sitemap/1.0";>
  
      <!-- use the standard components -->
      <map:components>
          <map:generators default="file"/>
          <map:transformers default="xslt"/>
          <map:readers default="resource"/>
          <map:serializers default="html"/>
          <map:selectors default="browser"/>
          <map:matchers default="wildcard"/>
          <map:transformers default="xslt"/>
      </map:components>
        
      <map:pipelines>
          <map:pipeline>
              <!-- respond to *.html requests with our docs processed by 
doc2html.xsl -->
              <map:match pattern="*.html">
                  <map:generate src="{1}.xml"/>
                  <map:transform src="doc2html.xsl"/>
                  <map:serialize type="html"/>
              </map:match>
              
              <!-- later, respond to *.pdf requests with our docs processed by 
doc2pdf.xsl -->
              <map:match pattern="*.pdf">
                  <map:generate src="{1}.xml"/>
                  <map:transform src="doc2pdf.xsl"/>
                  <map:serialize type="fo2pdf"/>
              </map:match>
          </map:pipeline>
      </map:pipelines>
  </map:sitemap>
          ]]></source>
          
  <note>The important thing here is the first <strong>map:match</strong> 
element, which tells Cocoon how to process
  requests ending in *.html in this directory. Again, we won't go into details 
here but that's where it happens.
  </note>
  <note>The above sitemap is already configured for PDF publishing, but this is 
not usable at this time as we haven't created
  the required XSLT transform yet.</note> 
         
  </s2>
  
  <s2 title="5. Test the HTML publishing" >
  <p>
  At this point you should be able to display the results in HTML: 
  </p>
  <ul>
  <li>
  <link 
href="http://localhost:8080/cocoon/mount/html-pdf/pageOne.html";>http://localhost:8080/cocoon/mount/html-pdf/pageOne.html</link>
  should display the first page with "Section one" in big letters.
  </li>
  <li>
  <link 
href="http://localhost:8080/cocoon/mount/html-pdf/pageTwo.html";>http://localhost:8080/cocoon/mount/html-pdf/pageTwo.html</link>
  should display the second page with "Yes it works" in big letters.
  </li>
  </ul>
  <note>If this doesn't work, you might want to first doublecheck the above 
steps, and then look at the Cocoon
  logs in the webapps/cocoon/WEB-INF/logs directory. You will find lots of 
information there: look for clues 
  in files that change in size when the error happens.
  </note>
  </s2>
  
  
  <s2 title="6. Create the XSLT transform for PDF" >
  <p>
  PDF documents are created via XSL-FO documents, which are XML documents that 
use a specific page-description
  vocabulary (see <link href="#references">References</link> below for more 
info). The actual conversion to PDF is done by the 
  <em>PdfSerializer</em> which uses software from <link 
href="http://xml.apache.org/fop";>FOP</link>, another Apache
  Software Foundation project.   
  </p>
  
  <p>
  To activate the PDF conversion, copy the file shown below to the 
<em>html-pdf</em> directory alongside your XML documents, naming it
  <strong>doc2pdf.xsl</strong>
  </p>
  
         <source><![CDATA[
  <?xml version="1.0" encoding="iso-8859-1"?>
  <xsl:stylesheet 
      xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; version="1.0"
      xmlns:fo="http://www.w3.org/1999/XSL/Format";
  >
      <!-- generate PDF page structure -->
      <xsl:template match="/">
          <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format";>
              <fo:layout-master-set>
                  <fo:simple-page-master master-name="page"
                    page-height="29.7cm" 
                    page-width="21cm"
                    margin-top="1cm" 
                    margin-bottom="2cm" 
                    margin-left="2.5cm" 
                    margin-right="2.5cm"
                  >
                      <fo:region-before extent="3cm"/>
                      <fo:region-body margin-top="3cm"/>
                      <fo:region-after extent="1.5cm"/>
                  </fo:simple-page-master>
  
                  <fo:page-sequence-master master-name="all">
                      <fo:repeatable-page-master-alternatives>
                          <fo:conditional-page-master-reference 
master-reference="page" page-position="first"/>
                      </fo:repeatable-page-master-alternatives>
                  </fo:page-sequence-master>
              </fo:layout-master-set>
  
              <fo:page-sequence master-reference="all">
                  <fo:flow flow-name="xsl-region-body">
                      <fo:block><xsl:apply-templates/></fo:block>
                  </fo:flow>
              </fo:page-sequence>
          </fo:root>
      </xsl:template>
  
      <!-- process paragraphs -->
      <xsl:template match="p">
          <fo:block><xsl:apply-templates/></fo:block>
      </xsl:template>
  
      <!-- convert sections to XSL-FO headings -->
      <xsl:template match="s1">
          <fo:block font-size="24pt" color="red" font-weight="bold">
              <xsl:apply-templates select="@title"/>
          </fo:block>
          <xsl:apply-templates/>
      </xsl:template>
  
  </xsl:stylesheet>
  ]]>
         </source>
  <note>This file is already referenced by the sitemap that we created, so no 
additional configuration is needed.</note>       
  </s2>
  
  <s2 title="5. Test the PDF publishing" >
  <p>
  At this point you should be able to display the results in PDF in addition to 
the existing HTML versions: 
  </p>
  <ul>
  <li>
  <link 
href="http://localhost:8080/cocoon/mount/html-pdf/pageOne.pdf";>http://localhost:8080/cocoon/mount/html-pdf/pageOne.pdf</link>
  should display the first page with "Section one" in big red letters.
  </li>
  <li>
  <link 
href="http://localhost:8080/cocoon/mount/html-pdf/pageTwo.pdf";>http://localhost:8080/cocoon/mount/html-pdf/pageTwo.pdf</link>
  should display the second page with "Yes it works" in big red letters.
  </li>
  </ul>
  </s2>
  
  </s1>
  
  <s1 title="Summary">
  <p>
  Hopefully you're beginning to see that this is not too complicated once you 
know what goes where. 
  <br/>
  The nice thing is that all of our huge corpus
  of XML documents (two documents actually, but that's a start..) is processed 
by just two XSLT files, one
  for each target format.
  <br/> 
  Changing the appearance of the published documents would require changing 
these XSLT transforms only, without
  touching the source documents.
  </p>
  </s1>
  
  <s1 title="Tips">
  <s2 title="Tip 1: Dynamic XML data">
  <p>
  Using dynamic XML as the data source is very easy as the Cocoon FileGenerator 
can read URLs as well. 
  <br/>
  If you add the map:match element shown in bold below <strong>before</strong> 
the existing map:match elements in your sitemap.xmap file, requesting
  <link 
href="http://localhost:8080/cocoon/mount/html-pdf/meerkat.html";>http://localhost:8080/cocoon/mount/html-pdf/meerkat.html</link>
  should display real-time news from Meerkat (assuming an Internet connection 
to Meerkat is available).
  <br/>
  The news will be displayed in a very rough format, but this can be made 
better by writing a 
  specific XSLT transform for this Meerkat data and using it instead of 
doc2html.xsl in the meerkat.html pipeline.  
  </p>
  
  <source>
  <![CDATA[
  ...
  <map:pipeline>
  ]]>
  <strong>
  <![CDATA[
  <map:match pattern="meerkat.html">
      <map:generate src="http://www.oreillynet.com/meerkat/?_fl=xml"/>
      <map:transform src="doc2html.xsl"/>
      <map:serialize type="html"/>
  </map:match>
  ]]>
  </strong>
  <![CDATA[
  <map:match pattern="*.html">
  etc...
  ]]>
  </source>
  </s2>
  
  <s2 title="Tip 2: Two-step conversion">
  <p>
  When you are generating multiple formats from a single data source, it is 
often a good idea to first generate
  an intermediate <em>logical document</em> that describes the output in a 
format-neutral way.
  <br/>
  This is obviously not needed in our simple example, but if you're aiming at 
more complicated 
  publishing tasks you might want to read about this "publishing pattern" in 
Martin Fowler's 
  <link href="http://www.martinfowler.com/isa/htmlRenderer.html";>Two Step 
View</link>
  article.
  </p>
  </s2>
  
  </s1>
  
  <s1 title="References">
  <anchor id="references"/>
  <p>
  To go further, you will need to learn about the following technologies and 
tools:
  </p>
  <ul>
  <li>
  Learning about the 
  <link 
href="http://www.google.com/search?as_sitesearch=xml.apache.org&amp;as_q=cocoon+concepts+sitemap";>
  Cocoon concepts</link> will help you understand how the sitemap, generators, 
transformers and serializers work.
  </li> 
  <li>
  Learning about <link href="http://www.w3.org/Style/XSL/";>XSLT</link> will 
allow you to write your own transforms to 
  generate HTML, PDF or other formats from XML data. 
  Information about XSL-FO is available at the same address.  
  </li>
  </ul>
  </s1>
  
  <s1 title="Comments">
  <p>
  Care to comment on this How-To? Got another tip? 
  Help keep this How-To relevant by passing along any useful feedback to the 
author,
  <link href="mailto:[EMAIL PROTECTED]">Bertrand&#160;Delacr&#232;taz</link>.
  </p>
  </s1>
  
  </body>
  </document>
  
  
  

----------------------------------------------------------------------
In case of troubles, e-mail:     [EMAIL PROTECTED]
To unsubscribe, e-mail:          [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to