rdonkin 2004/01/24 03:22:31
Modified: digester/src/java/org/apache/commons/digester Digester.java
package.html
Log:
Added some documentation on register() and on external entities. The content is
probably a little bit controversial but people using just system identifiers is a bit
peeve of mine. Free free to ammend or add different viewpoints :)
Revision Changes Path
1.91 +20 -6
jakarta-commons/digester/src/java/org/apache/commons/digester/Digester.java
Index: Digester.java
===================================================================
RCS file:
/home/cvs/jakarta-commons/digester/src/java/org/apache/commons/digester/Digester.java,v
retrieving revision 1.90
retrieving revision 1.91
diff -u -r1.90 -r1.91
--- Digester.java 10 Jan 2004 17:34:17 -0000 1.90
+++ Digester.java 24 Jan 2004 11:22:31 -0000 1.91
@@ -1660,9 +1660,23 @@
/**
- * Register the specified DTD URL for the specified public identifier.
+ * <p>Register the specified DTD URL for the specified public identifier.
* This must be called before the first call to <code>parse()</code>.
- *
+ * </p><p>
+ * <code>Digester</code> contains an internal <code>EntityResolver</code>
+ * implementation. This maps <code>PUBLICID</code>'s to URLs
+ * (from which the resource will be loaded). A common use case for this
+ * method is to register local URLs (possibly computed at runtime by a
+ * classloader) for DTDs. This allows the performance advantage of using
+ * a local version without having to ensure every <code>SYSTEM</code>
+ * URI on every processed xml document is local. This implementation provides
+ * only basic functionality. If more sophisticated features are required,
+ * using [EMAIL PROTECTED] #setEntityResolver} to set a custom resolver is
recommended.
+ * </p><p>
+ * <strong>Note:</strong> This method will have no effect when a custom
+ * <code>EntityResolver</code> has been set. (Setting a custom
+ * <code>EntityResolver</code> overrides the internal implementation.)
+ * </p>
* @param publicId Public identifier of the DTD to be resolved
* @param entityURL The URL to use for reading this DTD
*/
1.26 +75 -0
jakarta-commons/digester/src/java/org/apache/commons/digester/package.html
Index: package.html
===================================================================
RCS file:
/home/cvs/jakarta-commons/digester/src/java/org/apache/commons/digester/package.html,v
retrieving revision 1.25
retrieving revision 1.26
diff -u -r1.25 -r1.26
--- package.html 13 Jan 2004 20:23:25 -0000 1.25
+++ package.html 24 Jan 2004 11:22:31 -0000 1.26
@@ -19,6 +19,7 @@
<a href="#doc.Namespace">[Namespace Aware Parsing]</a>
<a href="#doc.Pluggable">[Pluggable Rules Processing]</a>
<a href="#doc.RuleSets">[Encapsulated Rule Sets]</a>
+<a href="#doc.RegisteringDTDs">[Registering DTDs]</a>
<a href="#doc.troubleshooting">[Troubleshooting]</a>
<a href="#doc.FAQ">[FAQ]</a>
<a href="#doc.Limits">[Known Limitations]</a>
@@ -993,6 +994,80 @@
the same set of nested elements at different nesting levels within an
XML document.</li>
</ul>
+<a name="doc.RegisteringDTDs"></a>
+<h3>Registering DTDs</h3>
+
+<h4>Brief (But Still Too Long) Introduction To System and Public Identifiers</h4>
+<p>A definition for an external entity comes in one of two forms:
+</p>
+<ol>
+ <li><code>SYSTEM <em>system-identifier</em></code></li>
+ <li><code>PUBLIC <em>public-identifier</em>
<em>system-identifier</em></code></li>
+</ol>
+<p>
+The <code><em>system-identifier</em></code> is an URI from which the resource can
be obtained
+(either directly or indirectly). Many valid URIs may identify the same resource.
+The <code><em>public-identifier</em></code> is an additional free identifier which
may be used
+(by the parser) to locate the resource.
+</p>
+<p>
+In practice, the weakness with a <code><em>system-identifier</em></code> is that
most parsers
+will attempt to interprete this URI as an URL, try to download the resource directly
+from the URL and stop the parsing if this download fails. So, this means that
+almost always the URI will have to be an URL from which the declaration
+can be downloaded.
+</p>
+<p>
+URLs may be local or remote but if the URL is chosen to be local, it is likely only
+to function correctly on a small number of machines (which are configured precisely
+to allow the xml to be parsed). This is usually unsatisfactory and so a universally
+accessable URL is preferred. This usually means an internet URL.
+</p>
+<p>
+To recap, in practice the <code><em>system-identifier</em></code> will (most
likely) be an
+internet URL. Unfortunately downloading from an internet URL is not only slow
+but unreliable (since successfully downloading a document from the internet
+relies on the client being connect to the internet and the server being
+able to satisfy the request).
+</p>
+<p>
+The <code><em>public-identifier</em></code> is a freely defined name but (in
practice) it is
+strongly recommended that a unique, readable and open format is used (for reasons
+that should become clear later). A Formal Public Identifier (FPI) is a very
+common choice. This public identifier is often used to provide a unique and location
+independent key which can be used to subsistute local resources for remote ones
+(hint: this is why ;).
+</p>
+<p>
+By using the second (<code>PUBLIC</code>) form combined with some form of local
+catalog (which matches <code><em>public-identifier</em></code>'s to local
resources) and where
+the <code><em>public-identifier</em></code> is a unique name and the
<code><em>system-identifier</em></code>
+is an internet URL, the practical disadvantages of specifying just a
+<code><em>system-identifier</em></code> can be avoided. Those external entities
which have been
+store locally (on the machine parsing the document) can be identified and used.
+Only when no local copy exists is it necessary to download the document
+from the internet URL. This naming scheme is recommended when using
<code>Digester</code>.
+</p>
+
+<h4>External Entity Resolution Using Digester</h4>
+<p>
+SAX factors out the resolution of external entities into an
<code>EntityResolver</code>.
+<code>Digester</code> supports the use of custom <code>EntityResolver</code>
+but ships with a simple internal implementation. This implementation allows local
URLs
+to be easily associated with <code><em>public-identifier</em></code>'s.
+</p>
+<p>For example:</p>
+<code><pre>
+ digester.register("-//Example Dot Com //DTD Sample Example//EN",
"assets/sample.dtd");
+</pre></code>
+<p>
+will make digester return the relative file path <code>assets/sample.dtd</code>
+whenever an external entity with public id
+<code>-//Example Dot Com //DTD Sample Example//EN</code> is needed.
+</p>
+<p><strong>Note:</strong> This is a simple (but useful) implementation.
+Greater sophistication requires a custom <code>EntityResolver</code>.</p>
+
<a name="doc.troubleshooting"></a>
<h3>Troubleshooting</h3>
<h4>Debugging Exceptions</h4>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]