tng 2003/01/24 11:59:56
Modified: c/doc faq-parse.xml program-dom.xml program.xml
Log:
Add FAQ about how entity reference are handled by DOMWriter.
Revision Changes Path
1.54 +23 -1 xml-xerces/c/doc/faq-parse.xml
Index: faq-parse.xml
===================================================================
RCS file: /home/cvs/xml-xerces/c/doc/faq-parse.xml,v
retrieving revision 1.53
retrieving revision 1.54
diff -u -r1.53 -r1.54
--- faq-parse.xml 23 Jan 2003 19:34:43 -0000 1.53
+++ faq-parse.xml 24 Jan 2003 19:59:56 -0000 1.54
@@ -749,7 +749,7 @@
<faq title="Why do I get segmentation fault when running on Redhat Linux?">
- <q> Why do I get segmentation fault when running on Redhat Linux?</q>
+ <q>Why do I get segmentation fault when running on Redhat Linux?</q>
<a>
@@ -759,6 +759,28 @@
Please try to upgrade your Redhat Linux gcc to the latest patch level and
see if it helps.
</p>
+ </a>
+ </faq>
+
+ <faq title="Why does the XML data generated by the DOMWriter does not match my
original XML input?">
+
+ <q>Why does the XML data generated by the DOMWriter does not match my original
XML input?</q>
+
+ <a>
+
+ <p>If you parse an xml document using XercesDOMParser or DOMBuilder and pass
such DOMNode
+ to DOMWriter for serialization, you may not get something that is exactly
the same
+ as the original XML data. The parser may have done normalization, end of
line conversion,
+ or has expanded the entity reference as per the XML 1.0 spec, 4.4 XML
Processor Treatment of
+ Entities and References. From DOMWriter perspective, it does not know
what the original
+ string was, all it sees is a processed DOMNode generated by the parser.
+ But since the DOMWriter is supposed to generate something that is parsable
if sent
+ back to the parser, it will not print the DOMNode node value as is. The
DOMWriter
+ may do some "touch up" to the output data for it to be parsable.</p>
+
+ <p>See <jump href="program-dom.html#DOMWriterEntityRef">How does DOMWriter
handle built-in entity
+ Reference in node value?</jump> to understand further how DOMWriter
touches up the entity reference.
+ </p>
</a>
</faq>
1.26 +102 -0 xml-xerces/c/doc/program-dom.xml
Index: program-dom.xml
===================================================================
RCS file: /home/cvs/xml-xerces/c/doc/program-dom.xml,v
retrieving revision 1.25
retrieving revision 1.26
diff -u -r1.25 -r1.26
--- program-dom.xml 9 Jan 2003 20:15:39 -0000 1.25
+++ program-dom.xml 24 Jan 2003 19:59:56 -0000 1.26
@@ -1324,6 +1324,108 @@
</p>
</s3>
+ <anchor name="DOMWriterEntityRef"/>
+ <s3 title="How does DOMWriter handle built-in entity Reference in node
value?">
+
+ <p>Say for example you parse the following xml document using
XercesDOMParser or DOMBuilder</p>
+<source>
+<root>
+<Test attr=" > ' &lt; &gt; &amp; &quot; &apos; "></Test>
+<Test attr=' > " &lt; &gt; &amp; &quot; &apos; '></Test>
+<Test> > " ' &lt; &gt; &amp; &quot; &apos; </Test>
+<Test><![CDATA[< > & " ' &lt; &gt; &amp; &quot;
&apos; ]]></Test>
+</root>
+</source>
+ <p>According to XML 1.0 spec, 4.4 XML Processor Treatment of Entities
and References, the parser
+ will expand the entity reference as follows</p>
+<source>
+<root>
+<Test attr=" > ' < > & " ' "></Test>
+<Test attr=' > " < > & " ' '></Test>
+<Test> > " ' < > & " ' </Test>
+<Test><![CDATA[< > & " ' &lt; &gt; &amp; &quot;
&apos; ]]></Test>
+</root>
+</source>
+
+ <p>and pass such DOMNode to DOMWriter for serialization. From
DOMWriter perspective, it
+ does not know what the original string was. All it sees is above
DOMNode from the
+ parser. But since the DOMWriter is supposed to generate something that
is parsable if sent
+ back to the parser, it cannot print such string as is. Thus the
DOMWriter is doing some
+ "touch up", just enough, to get the string parsable.</p>
+
+ <p>So for example since the appearance of < and & in text value
will lead to
+ not well-form XML error, the DOMWriter fixes them to &lt; and
&amp;
+ respectively; while the >, ' and " in text value are ok to the parser,
so DOMWriter does not
+ do anything to them. Similarly the DOMWriter fixes some of the
characters for the attribute value
+ but keep everything in CDATA.</p>
+
+ <p>So the string that is generated by DOMWriter will look like this</p>
+<source>
+<root>
+<Test attr=" > ' &lt; > &amp; &quot; ' "/>
+<Test attr=" > &quot; &lt; > &amp; &quot; ' "/>
+<Test> > " ' &lt; > &amp; " ' </Test>
+<Test><![CDATA[< > & " ' &lt; &gt; &amp; &quot;
&apos; ]]></Test>
+</root>
+</source>
+ <p>To summarize, here is the table that summarize how built-in entity
refernece are handled for
+ different Node Type:</p>
+ <table>
+ <tr>
+ <th><em>Input/Output</em></th>
+ <th><em><</em></th>
+ <th><em>></em></th>
+ <th><em>&</em></th>
+ <th><em>"</em></th>
+ <th><em>'</em></th>
+ <th><em>&lt;</em></th>
+ <th><em>&gt;</em></th>
+ <th><em>&amp;</em></th>
+ <th><em>&quot;</em></th>
+ <th><em>&apos;</em></th>
+ </tr>
+ <tr>
+ <td><em>Attribute</em></td>
+ <td>N/A</td>
+ <td>></td>
+ <td>N/A</td>
+ <td>&quot;</td>
+ <td>'</td>
+ <td>&lt;</td>
+ <td>></td>
+ <td>&amp;</td>
+ <td>&quot;</td>
+ <td>'</td>
+ </tr>
+ <tr>
+ <td><em>Text</em></td>
+ <td>N/A</td>
+ <td>></td>
+ <td>N/A</td>
+ <td>"</td>
+ <td>'</td>
+ <td>&lt;</td>
+ <td>></td>
+ <td>&amp;</td>
+ <td>"</td>
+ <td>'</td>
+ </tr>
+ <tr>
+ <td><em>CDATA</em></td>
+ <td><</td>
+ <td>></td>
+ <td>&</td>
+ <td>"</td>
+ <td>'</td>
+ <td>&lt;</td>
+ <td>&gt;</td>
+ <td>&amp;</td>
+ <td>&quot;</td>
+ <td>&apos;</td>
+ </tr>
+ </table>
+ </s3>
+
<anchor name="DOMWriterFeatures"/>
<s3 title="DOMWriter Supported Features">
1.35 +1 -0 xml-xerces/c/doc/program.xml
Index: program.xml
===================================================================
RCS file: /home/cvs/xml-xerces/c/doc/program.xml,v
retrieving revision 1.34
retrieving revision 1.35
diff -u -r1.34 -r1.35
--- program.xml 6 Jan 2003 21:19:41 -0000 1.34
+++ program.xml 24 Jan 2003 19:59:56 -0000 1.35
@@ -35,6 +35,7 @@
<li><jump href="program-dom.html#DOMWriter">DOMWriter</jump></li>
<ul>
<li><jump href="program-dom.html#ConstructDOMWriter">Constructing a
DOMWriter</jump></li>
+ <li><jump href="program-dom.html#DOMWriterEntityRef">How does DOMWriter
handle built-in entity Reference in node value?</jump></li>
<li><jump href="program-dom.html#DOMWriterFeatures">Supported
Features</jump></li>
</ul>
<li><jump href="program-dom.html#Deprecated">Deprecated - Java-like
DOM</jump></li>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]