shannon 2002/07/03 13:27:14
Added: src/documentation/xdocs/howto howto-html-pdf-publishing.xml
Log:
New How-To on publishing
HTML and PDF docs in Cocoon
by Betrand Delacretaz
[EMAIL PROTECTED]
Revision Changes Path
1.1
xml-cocoon2/src/documentation/xdocs/howto/howto-html-pdf-publishing.xml
Index: howto-html-pdf-publishing.xml
===================================================================
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.0//EN"
"../dtd/document-v10.dtd">
<document>
<header>
<title>How to publish XML documents in HTML and PDF</title>
<authors>
<person name="Bertrand Delacrètaz" email="[EMAIL PROTECTED]"/>
</authors>
</header>
<body>
<s1 title="Overview">
<p>
Without requiring any prior knowledge of Cocoon, XSLT or XSL-FO, this How-To shows
you how to publish XML
documents in HTML and PDF using Cocoon.
<br/>
The steps below have been tested with Cocoon 2.0.2-dev but should work with any 2.x
version.
</p>
</s1>
<s1 title="Purpose">
<p>
We will build a simple pipeline that converts XML documents into HTML or PDF
on-the-fly using simple
XSLT transforms.
<br/>
This is similar to the <em>hello.html</em> and <em>hello.pdf</em> samples of the
standard Cocoon installation, but here you
will be building it yourself, which should help you get a better feel of how this
works.
</p>
</s1>
<s1 title="Intended Audience">
<p>
Beginning Cocoon users who want to learn how to publish HTML and/or PDF documents
from XML data.
</p>
</s1>
<s1 title="Prerequisites">
<p>Here's what you need:</p>
<ul>
<li>Cocoon must be running on your system . </li>
<li>This document assumes a standard installation where
<link
href="http://localhost:8080/cocoon/mount/">http://localhost:8080/cocoon/mount/</link>
points to
the <em>mount</em> subdirectory of the Cocoon installation. Calling this URL should
display a page
titled "Directory Listing of mount".
<br/>
If your installation runs on a different URL, you will have to adjust
the URLs given in this document accordingly.
</li>
<li>You must be able to create and edit XML files in the <em>mount</em> subdirectory
of the Cocoon installation.
In a standard installation, this is <em>webapps/cocoon/mount</em> under the
directory of the tomcat installation.
</li>
</ul>
<note>You will not need a fancy XML editor for this, copying and pasting the
examples into any text editor
will do.</note>
</s1>
<s1 title="Steps">
<p>
Here's how to proceed.
</p>
<s2 title="1. Create the work directory under mount" >
<p>
Under <em>webapps/cocoon/mount</em>, create a new directory named <em>html-pdf</em>.
All files used by this How-To will reside in this directory.
<br/>
After a browser refresh, <link
href="http://localhost:8080/cocoon/mount/">http://localhost:8080/cocoon/mount/</link>
should display the name of this new directory, among others.
</p>
</s2>
<s2 title="2. Create the XML example documents" >
<p>
To keep it simple we will use two small XML files as our data source.
Later, you will probably use other data sources like live XML feeds, databases, etc.
</p>
<p>
In the <em>html-pdf</em> directory, create the following two files, naming them
exactly as
shown.
</p>
<note>
Be careful about lower/uppercase in filenames if you're working on a unix or linux
system.
On such systems, <em>thisFile.xml</em> is not the same as <em>Thisfile.xml</em>.
</note>
<note>
To avoid any errors, use copy/paste when creating XML documents from examples on
this page.
<br/>
Also, do not leave spaces at the start of XML files - the <?xml... processing
instruction must
be the first character in the file.
</note>
<p>
Contents of file <strong>pageOne.xml</strong>:
</p>
<source><![CDATA[
<?xml version="1.0" encoding="iso-8859-1"?>
<page>
<title>This is the pageOne.xml example</title>
<s1 title="Section one">
<p>This is the text of section one</p>
</s1>
</page>
]]></source>
<p>
Contents of file <strong>pageTwo.xml</strong>:
</p>
<source><![CDATA[
<?xml version="1.0" encoding="iso-8859-1"?>
<page>
<title>This is the pageTwo.xml example</title>
<s1 title="Yes, it works">
<p>Now you're hopefully seeing pageTwo in HTML or PDF</p>
</s1>
</page>
]]></source>
</s2>
<s2 title="3. Create the XSLT transform for HTML" >
<p>
The most common way of producing HTML in Cocoon is to use <em>XSLT transforms</em>
to select and convert
the appropriate elements of the input documents.
</p>
<p>
Copy the file shown below to the <em>html-pdf</em> directory alongside your XML
documents, naming it
<strong>doc2html.xsl</strong>
</p>
<source><![CDATA[
<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<!-- generate HTML skeleton on root element -->
<xsl:template match="/">
<html>
<head>
<title><xsl:apply-templates select="page/title"/></title>
</head>
<body>
<xsl:apply-templates/>
</body>
</html>
</xsl:template>
<!-- story is used later by the Meerkat example -->
<xsl:template match="p|story">
<p><xsl:apply-templates/></p>
</xsl:template>
<!-- convert sections to HTML headings -->
<xsl:template match="s1">
<h1><xsl:apply-templates select="@title"/></h1>
<xsl:apply-templates/>
</xsl:template>
</xsl:stylesheet>
]]></source>
<note>
Basically what this does is generate an HTML skeleton and convert the input markup
to HTML. We won't go
into details here, our goal is just to show you how the components of the publishing
chain are combined.
</note>
</s2>
<s2 title="4. Create the sitemap" >
<p>
We now have documents to publish, and an XSLT transform to convert them to our HTML
output format.
What's left is to connect these together when a request is made to Cocoon - that's
the role of the <em>sitemap</em>,
which will select a <em>processing pipeline</em> based on the request received from
the browser.
</p>
<p>
To tell Cocoon how we want it to process requests made to <em>html-pdf</em>,
copy the following contents to a file named <strong>sitemap.xmap</strong> in the
<em>html-pdf</em> subdirectory.
</p>
<source><![CDATA[
<?xml version="1.0" encoding="iso-8859-1"?>
<map:sitemap xmlns:map="http://apache.org/cocoon/sitemap/1.0">
<!-- use the standard components -->
<map:components>
<map:generators default="file"/>
<map:transformers default="xslt"/>
<map:readers default="resource"/>
<map:serializers default="html"/>
<map:selectors default="browser"/>
<map:matchers default="wildcard"/>
<map:transformers default="xslt"/>
</map:components>
<map:pipelines>
<map:pipeline>
<!-- respond to *.html requests with our docs processed by doc2html.xsl
-->
<map:match pattern="*.html">
<map:generate src="{1}.xml"/>
<map:transform src="doc2html.xsl"/>
<map:serialize type="html"/>
</map:match>
<!-- later, respond to *.pdf requests with our docs processed by
doc2pdf.xsl -->
<map:match pattern="*.pdf">
<map:generate src="{1}.xml"/>
<map:transform src="doc2pdf.xsl"/>
<map:serialize type="fo2pdf"/>
</map:match>
</map:pipeline>
</map:pipelines>
</map:sitemap>
]]></source>
<note>The important thing here is the first <strong>map:match</strong> element,
which tells Cocoon how to process
requests ending in *.html in this directory. Again, we won't go into details here
but that's where it happens.
</note>
<note>The above sitemap is already configured for PDF publishing, but this is not
usable at this time as we haven't created
the required XSLT transform yet.</note>
</s2>
<s2 title="5. Test the HTML publishing" >
<p>
At this point you should be able to display the results in HTML:
</p>
<ul>
<li>
<link
href="http://localhost:8080/cocoon/mount/html-pdf/pageOne.html">http://localhost:8080/cocoon/mount/html-pdf/pageOne.html</link>
should display the first page with "Section one" in big letters.
</li>
<li>
<link
href="http://localhost:8080/cocoon/mount/html-pdf/pageTwo.html">http://localhost:8080/cocoon/mount/html-pdf/pageTwo.html</link>
should display the second page with "Yes it works" in big letters.
</li>
</ul>
<note>If this doesn't work, you might want to first doublecheck the above steps, and
then look at the Cocoon
logs in the webapps/cocoon/WEB-INF/logs directory. You will find lots of information
there: look for clues
in files that change in size when the error happens.
</note>
</s2>
<s2 title="6. Create the XSLT transform for PDF" >
<p>
PDF documents are created via XSL-FO documents, which are XML documents that use a
specific page-description
vocabulary (see <link href="#references">References</link> below for more info). The
actual conversion to PDF is done by the
<em>PdfSerializer</em> which uses software from <link
href="http://xml.apache.org/fop">FOP</link>, another Apache
Software Foundation project.
</p>
<p>
To activate the PDF conversion, copy the file shown below to the <em>html-pdf</em>
directory alongside your XML documents, naming it
<strong>doc2pdf.xsl</strong>
</p>
<source><![CDATA[
<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
xmlns:fo="http://www.w3.org/1999/XSL/Format"
>
<!-- generate PDF page structure -->
<xsl:template match="/">
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
<fo:layout-master-set>
<fo:simple-page-master master-name="page"
page-height="29.7cm"
page-width="21cm"
margin-top="1cm"
margin-bottom="2cm"
margin-left="2.5cm"
margin-right="2.5cm"
>
<fo:region-before extent="3cm"/>
<fo:region-body margin-top="3cm"/>
<fo:region-after extent="1.5cm"/>
</fo:simple-page-master>
<fo:page-sequence-master master-name="all">
<fo:repeatable-page-master-alternatives>
<fo:conditional-page-master-reference
master-reference="page" page-position="first"/>
</fo:repeatable-page-master-alternatives>
</fo:page-sequence-master>
</fo:layout-master-set>
<fo:page-sequence master-reference="all">
<fo:flow flow-name="xsl-region-body">
<fo:block><xsl:apply-templates/></fo:block>
</fo:flow>
</fo:page-sequence>
</fo:root>
</xsl:template>
<!-- process paragraphs -->
<xsl:template match="p">
<fo:block><xsl:apply-templates/></fo:block>
</xsl:template>
<!-- convert sections to XSL-FO headings -->
<xsl:template match="s1">
<fo:block font-size="24pt" color="red" font-weight="bold">
<xsl:apply-templates select="@title"/>
</fo:block>
<xsl:apply-templates/>
</xsl:template>
</xsl:stylesheet>
]]>
</source>
<note>This file is already referenced by the sitemap that we created, so no
additional configuration is needed.</note>
</s2>
<s2 title="5. Test the PDF publishing" >
<p>
At this point you should be able to display the results in PDF in addition to the
existing HTML versions:
</p>
<ul>
<li>
<link
href="http://localhost:8080/cocoon/mount/html-pdf/pageOne.pdf">http://localhost:8080/cocoon/mount/html-pdf/pageOne.pdf</link>
should display the first page with "Section one" in big red letters.
</li>
<li>
<link
href="http://localhost:8080/cocoon/mount/html-pdf/pageTwo.pdf">http://localhost:8080/cocoon/mount/html-pdf/pageTwo.pdf</link>
should display the second page with "Yes it works" in big red letters.
</li>
</ul>
</s2>
</s1>
<s1 title="Summary">
<p>
Hopefully you're beginning to see that this is not too complicated once you know
what goes where.
<br/>
The nice thing is that all of our huge corpus
of XML documents (two documents actually, but that's a start..) is processed by just
two XSLT files, one
for each target format.
<br/>
Changing the appearance of the published documents would require changing these XSLT
transforms only, without
touching the source documents.
</p>
</s1>
<s1 title="Tips">
<s2 title="Tip 1: Dynamic XML data">
<p>
Using dynamic XML as the data source is very easy as the Cocoon FileGenerator can
read URLs as well.
<br/>
If you add the map:match element shown in bold below <strong>before</strong> the
existing map:match elements in your sitemap.xmap file, requesting
<link
href="http://localhost:8080/cocoon/mount/html-pdf/meerkat.html">http://localhost:8080/cocoon/mount/html-pdf/meerkat.html</link>
should display real-time news from Meerkat (assuming an Internet connection to
Meerkat is available).
<br/>
The news will be displayed in a very rough format, but this can be made better by
writing a
specific XSLT transform for this Meerkat data and using it instead of doc2html.xsl
in the meerkat.html pipeline.
</p>
<source>
<![CDATA[
...
<map:pipeline>
]]>
<strong>
<![CDATA[
<map:match pattern="meerkat.html">
<map:generate src="http://www.oreillynet.com/meerkat/?_fl=xml"/>
<map:transform src="doc2html.xsl"/>
<map:serialize type="html"/>
</map:match>
]]>
</strong>
<![CDATA[
<map:match pattern="*.html">
etc...
]]>
</source>
</s2>
<s2 title="Tip 2: Two-step conversion">
<p>
When you are generating multiple formats from a single data source, it is often a
good idea to first generate
an intermediate <em>logical document</em> that describes the output in a
format-neutral way.
<br/>
This is obviously not needed in our simple example, but if you're aiming at more
complicated
publishing tasks you might want to read about this "publishing pattern" in Martin
Fowler's
<link href="http://www.martinfowler.com/isa/htmlRenderer.html">Two Step View</link>
article.
</p>
</s2>
</s1>
<s1 title="References">
<anchor id="references"/>
<p>
To go further, you will need to learn about the following technologies and tools:
</p>
<ul>
<li>
Learning about the
<link
href="http://www.google.com/search?as_sitesearch=xml.apache.org&as_q=cocoon+concepts+sitemap">
Cocoon concepts</link> will help you understand how the sitemap, generators,
transformers and serializers work.
</li>
<li>
Learning about <link href="http://www.w3.org/Style/XSL/">XSLT</link> will allow you
to write your own transforms to
generate HTML, PDF or other formats from XML data.
Information about XSL-FO is available at the same address.
</li>
</ul>
</s1>
<s1 title="Comments">
<p>
Care to comment on this How-To? Got another tip?
Help keep this How-To relevant by passing along any useful feedback to the author,
<link href="mailto:[EMAIL PROTECTED]">Bertrand Delacrètaz</link>.
</p>
</s1>
</body>
</document>
----------------------------------------------------------------------
In case of troubles, e-mail: [EMAIL PROTECTED]
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]