The attached patch cleans up the Entity Catalogs doco ...
- reflects the new way of auto-loading the default catalog
- explains that the properties file is now for local use
- adds a local configuration example for DocBook DTDs
cheers, David
Index: catalog.xml
===================================================================
RCS file: /home/cvspublic/xml-cocoon2/xdocs/catalog.xml,v
retrieving revision 1.3
diff -u -r1.3 catalog.xml
--- catalog.xml 2001/09/03 11:41:07     1.3
+++ catalog.xml 2001/09/07 07:43:07
@@ -1,12 +1,13 @@
 <?xml version="1.0"?>
 
-<!DOCTYPE document SYSTEM "dtd/document-v10.dtd">
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.0//EN"
+   "dtd/document-v10.dtd">
 
 <document>
  <header>
   <title>Entity resolution with catalogs</title>
   <subtitle>Resolve external entities to local or other resources</subtitle>
-  <version>1.3</version> 
+  <version>1.4</version> 
   <type>Technical document</type> 
   <authors>
    <person name="David Crossley" email="[EMAIL PROTECTED]"/>
@@ -16,20 +17,34 @@
  <body>
  <s1 title="Introduction">
   <p>
-   @docname@ has the capability to utilise an entity resolution mechanism. This
-   assists with entity management and also reduces the necessity for expensive
-   and failure-prone network retrieval of the required resources (e.g. DTDs,
-   character entity sets, XML sub-documents).
+   @docname@ has the capability to utilise an entity resolution mechanism. 
+   External entities (e.g. Document Type Definitions (DTDs), character entity
+   sets, XML sub-documents) are resources that are declared by an XML instance
+   document - they exist as separate objects. An entity catalog assists with
+   entity management and the resolution of entities to accessible resources.
+   It also reduces the necessity for expensive and failure-prone network
+   retrieval of the required resources.
   </p>
  </s1>
 
  <s1 title="Overview">
   <p>
-   "Entities" represent the physical structure of an XML instance document, whereas 
"elements" represent the logical structure. The complete entity structure of the 
document defines which pieces need to be incorporated, so as to build the final 
document. Those entities are objects from some accessible place, e.g. local file 
system, local network, remote network, generated from a database. Example entities 
are: DTDs, XML sub-documents, sets of character entities to represent symbols and 
other glyphs, image files.
+   "Entities" represent the physical structure of an XML instance document,
+   whereas "elements" represent the logical structure. The complete entity
+   structure of the document defines which pieces need to be incorporated, so
+   as to build the final document. Those entities are objects from some
+   accessible place, e.g. local file system, local network, remote network,
+   generated from a database. Example entities are: DTDs, XML sub-documents,
+   sets of character entities to represent symbols and other glyphs, image
+   files.
   </p>
 
   <p>
-   So how are you going to define the accessible location of all those pieces? How 
will you ensure that those resources are reliably available? Entity resolution 
catalogs to the rescue. These are simple standards-based plain-text files to map 
public identifiers and system identifiers to local or other resources.
+   So how are you going to define the accessible location of all those pieces?
+   How will you ensure that those resources are reliably available? Entity
+   resolution catalogs to the rescue. These are simple standards-based
+   plain-text files to map public identifiers and system identifiers to local
+   or other resources.
   </p>
 
   <p>
@@ -61,25 +76,25 @@
    <li>
     <link href="#demo2">Demonstration #2</link>
      - explains more detailed need and use of catalogs
+     and shows catalogs in action
    </li>
    <li>
     <link href="#imp">Implementation and default configuration</link>
      - describes how support for catalogs is added to @docname@ and
-     explain the default configuration (which should work out-of-the-box)
+     explains the default automated configuration
    </li>
    <li>
     <link href="#config">Local configuration</link>
      - explains how to extend the default configuration for your local
-     system reqirements and provides an example
+     system requirements and provides an example
    </li>
    <li>
     <link href="#dev">Development notes</link>
-     - default catalog support is now in the 2.1-dev branch
-     - needs to confirm operation on all major platforms
+     - some minor issues need to be addressed
    </li>
    <li>
     <link href="#notes">Other notes</link>
-     - assorted notes
+     - assorted dot-points
    </li>
    <li>
     <link href="#summ">Summary</link>
@@ -94,22 +109,29 @@
  <anchor id="background"/>
  <s1 title="Background">
   <p>
-   The following article eloquently describes the need for all
-parsers and XML frameworks to be capable of utilising entity
-resolvers.
+   The following article eloquently describes the need for all parsers and
+   XML frameworks to be capable of utilising entity resolvers.
    "<link 
href="http://www.arbortext.com/Think_Tank/XML_Resources/Issue_Three/issue_three.html";>If
 You Can Name It, You Can Claim It!</link>"
-   by Norman Walsh. Please read that document, then return here to apply entity 
catalogs to @docname@.
+   by Norman Walsh. Please read that document, then return here to apply
+   entity catalogs to @docname@.
   </p>
 
   <p>
-   (Note: That article (and Java classes) evolved to become the Sun 
<code>resolver.zip</code> Java package that has been added to @docname@ - a more 
recent version of the article is available with the Sun download (see below). The API 
javadocs from your build have further information. However, you do not need to know 
the gory details to understand catalogs and configure them.)
+   (Note: That article (and Java classes) evolved to become the Sun
+   <code>resolver.zip</code> Java package that has been added to @docname@
+   - a more recent version of the article is available with the Sun download
+   (see below). The API javadocs from your build have further information.
+   However, you do not need to know the gory details to understand catalogs
+   and configure them.)
   </p>
  </s1>
 
  <anchor id="demo1"/>
  <s1 title="Demonstration #1">
   <p>
-   This snippet from an XML instance shows the Document Type Declaration. Notice that 
it declares its ruleset, the Document Type Definition (DTD), as an external entity. 
Notice also that the resource is network-based.
+   This snippet from an XML instance shows the Document Type Declaration.
+   Notice that it declares its ruleset, the Document Type Definition (DTD),
+   as an external entity. Notice also that the resource is network-based.
   </p>
 
 <source><![CDATA[
@@ -122,15 +144,25 @@
 ]]></source>
 
   <p>
-   Now consider what will happen when @docname@ tries to process this XML instance. 
Whether you have set validation=yes or not, the parser will still want to resolve all 
of the entities that are required by the XML instance (i.e. the DTD and any other 
entities that the DTD might declare). So it will happily trundle across the network to 
get them. It will do this every time that the document is processed. This is obviously 
a needless overhead. Worse still, what happens if that host is down or the network is 
congested. Additionally, if your @docname@ is an off-line server then it is always 
broken because it cannot retrieve the network-based resources.
+   Now consider what will happen when @docname@ tries to process this XML
+   instance. Whether you have set validation=yes or not, the parser will
+   still want to resolve all of the entities that are required by the XML
+   instance (i.e. the DTD and any other entities that the DTD might declare).
+   So it will happily trundle across the network to get them. It will do this
+   every time that the document is processed. This is obviously a needless
+   overhead. Worse still, what happens if that host is down or the network is
+   congested. Additionally, if your @docname@ is an off-line server then it is
+   always broken because it cannot retrieve the network-based resources.
   </p>
  </s1>
 
  <anchor id="cat"/>
  <s1 title="Catalogs overview">
   <p>
-   As the Walsh document explained, the secrets to entity resolution are the public 
identifiers, system identifiers, and the catalog to map between them. Here we provide 
an overview and show an example catalog which we will then use with the
-   <link href="#demo2">Demonstration #2</link> below.
+   As the Walsh document explained, the secrets to entity resolution are the
+   public identifiers, system identifiers, and the catalog to map between them.
+   Here we provide an overview and show an example catalog which we will then
+   use with the <link href="#demo2">Demonstration #2</link> below.
   </p>
 
   <s2 title="External entity declarations">
@@ -156,8 +188,6 @@
   "http://www.oasis-open.org/docbook/xml/4.1/docbookx.dtd";>
 ]]></source>
 
-<note>TODO: briefly explain each of those declarations</note>
-
   <p>
    (In your XML instance document, or DTD, you would include those entities
    like this ... <code>%ISOnum;</code>)
@@ -243,11 +273,10 @@
    role, and each included external entity reports how it came into being.
    This example builds upon the example provided by the Walsh article.
    (Tip: To see the error message that would result from not using a catalog,
-   simply rename the default properties file or default catalog file before
-   starting @docname@.)
+   simply rename the default catalog file before starting @docname@.)
   </p>
 
-<note>TODO: ensure that the link to samples works OK</note>
+<note>TODO: ensure that the link to samples works OK in the various documentation 
+situations (i.e. static site, local docs build)</note>
 
   <p>Here is the source for the top-level XML instance document
    <code>test.xml</code> ...
@@ -349,26 +378,31 @@
    The SAX <code>Parser</code> interface provides an <code>entityResolver</code>
     hook to allow an application to resolve the external entities. The Sun
     Microsystems Java code for "<code>resolver.jar</code>" provides a
-    CatalogManager. This is incorporated into @doctitle@ as
-    <code>org.apache.cocoon.components.resolver</code> and configuration is
-    achieved via the <code>CatalogManager.properties</code> file.
+    CatalogManager. This is incorporated into @docname@ as
+    <code>org.apache.cocoon.components.resolver</code> and local configuration
+    is achieved via the <code>CatalogManager.properties</code> file.
   </p>
 
   <ul>
    <li>A default catalog and some base entities (e.g. ISO*.pen character
-    entity sets) are included in the @doctitle@ distribution at 
+    entity sets) are included in the @docname@ distribution at 
     <code>webapps/cocoon/resources/entities/</code>
    </li>
-   <li>A default annotated <code>CatalogManager.properties</code> file is 
-    included with the distribution (see the Build Notes below).
+   <li>The default catalog is automatically loaded at startup.
    </li>
-   <li>The automatic default configuration should work out-of-the-box</li>
+   <li>An annotated <code>CatalogManager.properties</code> file is included
+    with the distribution to facilitate the configuration of local catalogs.
+   </li>
+   <li>The automatic default configuration should work out-of-the-box.</li>
   </ul>
 
-  <note>TODO: We need to explain the properties file here in doco (the internal
-   annotation helps for now) ... full documentation is available with the
-   Sun download.
-  </note>
+  <p>
+   When the parser needs to load a declared entity, then it first consults
+   the Catalog Manager to get a possible mapping to an alternate system
+   identifier. If there is no mapping for an identifier in the catalogs
+   (or in any sub-ordinate catalogs), then @docname@ will carry on to
+   retrieve the resource using the original declared system identifier.
+  </p>
 
   <p>
    If you suspect problems, then you can raise the level of the
@@ -376,43 +410,27 @@
    to stdout when @docname@ starts and operates. You would also do this to
    detect any misconfiguration of your own catalogs.
   </p>
-
-  <s2 title="Build Notes">
-   <p>
-    Use the following options to your build command ...
-    <br/><code>-Dinclude.webapp.libs=yes</code>
-    <br/><code>-Dinstall.war=$TOMCAT_HOME/webapps install</code>
-   </p>
-
-   <p>
-    This allows the build process to copy the properties file from
-<code>$COCOON_HOME/webapp/resources/entities/CatalogManager.properties</code>
-    to
-<code>$TOMCAT_HOME/webapps/cocoon/WEB-INF/classes/CatalogManager.properties</code>
-    thereby making it available to the Java classpath. The build process will
-    also automatically adjust the full pathname for the default catalog to suit
-    your local directory structure.
-   </p>
-
-   <p>
-    If you see an error message going to STDOUT when @docname@ starts 
-    (<code>Cannot find CatalogManager.properties</code>) then this means that
-    the properties file is not available to the Java classpath. Please ensure
-    that you build as described above, or edit and move the properties file
-    into place manually.
-   </p>
-  </s2>
  </s1>
 
  <anchor id="config"/>
  <s1 title="Local configuration">
   <p>
-   You can add your own catalog by appending another full pathname to
-   the <code>catalogs</code> property in the default properties file
-   (see notes inside the properties file).
+   You can add your own local catalogs using the <code>catalogs</code> property
+   in the default properties file. See the notes inside the properties file).
   </p>
 
   <p>
+   The build process will automatically copy the properties file from
+<code>$COCOON_HOME/webapp/resources/entities/CatalogManager.properties</code>
+   to
+<code>$TOMCAT_HOME/webapps/cocoon/WEB-INF/classes/CatalogManager.properties</code>
+   thereby making it available to the Java classpath.
+   If you see an error message going to STDOUT when @docname@ starts 
+   (<code>Cannot find CatalogManager.properties</code>) then this means that
+   the properties file is not available to the Java classpath.
+  </p>
+
+  <p>
    The actual "catalog" files have a powerful set of directives. 
    For example, the <strong>CATALOG</strong> directive facilitates the
    inclusion of a sub-ordinate catalog. The list of resources below will
@@ -481,30 +499,12 @@
    <li>5) ? What other default entities need to be shipped with the @docname@
     distribution? We already have some character entity sets (ISO*.pen).
    </li>
-   <li>6) Future: It would be nice to have the 
-    <code>org.apache.cocoon.components.resolver</code> classes
-    automatically load the default catalog, thereby leaving the
-    <code>properties</code> config file totally free for local use.
+   <li>7)
    </li>
   </ul>
 
-  <p>
-   Platform testing so far ...
-  </p>
-
-  <ol>
-   <li>Linux Red Hat 7.1, java.vm.version=Blackdown-1.3.1-FCS,
-    Tomcat 3.2.2 ... OK</li>
-   <li>Win2K, Tomcat 3.3 ... OK</li>
-   <li>Windows 2000 Professional, Tomcat 3.2.3 and Tomcat 3.2.1 ... OK</li>
-   <li>Macintosh ... looking for success story</li>
-   <li>Other Windows ... looking for success story</li>
-   <li>Other UNIX ... looking for success story</li>
-   <li>Other JDK versions ... looking for success story</li>
-  </ol>
-
   <p>
-   Some core @docname@ FIXME notes can be addressed by catalog ...
+   Some core @docname@ FIXME notes can be now be addressed by catalog ...
   </p>
 
   <ul>
@@ -514,7 +514,7 @@
    <li>there are various other hard-coded pathnames to XML resources
    </li>
    <li>this needs further investigation after basic catalog support is
-    implemented
+    fully settled
    </li>
   </ul>
 
@@ -534,12 +534,9 @@
    <li>There has been a recent flood of XML tools - unfortunately, many do not
     implement entity resolution (other than by brute-force retrieval), so
     those tools are crippled and cannot be used for serious XML processing.
-    Please ensure that you choose proper XML tools for the preparation and
-    vaildation of your XML instance documents.
-   </li>
-   <li>If there is no mapping for an identifier in the catalog (or in any
-    sub-ordinate catalogs), then @docname@ will carry on to retrieve the
-    resource using the declared system identifier.
+    Please ensure that you choose 
+    <link href="http://www.oasis-open.org/cover/";>proper XML tools</link>
+    for the preparation and validation of your XML instance documents.
    </li>
    <li>The default catalog that is shipped with the @docname@ distribution is
     deliberately basic. You will need to supplement it with your own catalog
@@ -551,11 +548,23 @@
  <anchor id="summ"/>
  <s1 title="Summary">
   <p>
-   Most XML documents that we would want to serve with @docname@ are already in 
existence in another information system. The XML document instances have a declaration 
of their DTD Document Type Definition as an external file. This external DTD also 
includes entity sets such as ISOnum, ISOlat1, etc. Also the DTD declaration has a 
Formal Public Identifier and a System Identifier which points to a remote URL. These 
XML instance documents cannot be altered to make workaround solutions like 
<code>../dtd/document-1.0.dtd</code>
+   Most XML documents that we would want to serve with @docname@ are already
+   in existence in another information system. The XML document instances have
+   a declaration of their DTD Document Type Definition as an external file.
+   This external DTD also includes entity sets such as ISOnum, ISOlat1, etc.
+   Also the DTD declaration has a Formal Public Identifier and a System
+   Identifier which points to a remote URL. These XML instance documents cannot
+   be altered to make workaround solutions like
+   <code>../dtd/document-1.0.dtd</code>
   </p>
 
   <p>
-   Entity management is effected by providing a standards-based mechanism to resolve 
public identifiers and system identifiers to local filenames or other identifiers or 
even to other remote network resources. So references to external DTDs, sets of 
character entities such as mathematical symbols, fragments of XML documents, complete 
sub-documents, non-xml data chunks (like images), etc. can all be centrally managed 
and resolved locally.
+   Entity management is effected by providing a standards-based mechanism to
+   resolve public identifiers and system identifiers to local filenames or
+   other identifiers or even to other remote network resources. So references
+   to external DTDs, sets of character entities such as mathematical symbols,
+   fragments of XML documents, complete sub-documents, non-xml data chunks
+   (like images), etc. can all be centrally managed and resolved locally.
   </p>
  </s1>
 
@@ -568,7 +577,8 @@
   <ul>
    <li><link href="http://www.oasis-open.org/committees/entity/";>OASIS Entity
     Resolution Technical Committee</link> - see especially the
-    <link href="http://www.oasis-open.org/specs/a401.html";>specification for OASIS 
Catalogs</link> (TR 9401:1995 Entity Management)
+    <link href="http://www.oasis-open.org/specs/a401.html";>specification for
+    OASIS Catalogs</link> (TR 9401:1995 Entity Management)
     and the 
     <link href="http://www.oasis-open.org/committees/entity/spec.html";>specification 
for XML Catalogs</link>
    </li>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

Reply via email to