This is an automated email from the ASF dual-hosted git repository.

mbeckerle pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-daffodil-site.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 36e5d6c  Publishing from 3a4708345a723b264aa6a666c67089c40d4c741a
36e5d6c is described below

commit 36e5d6c75069467ccfbafaa4d8e4282a037c4af2
Author: Michael Beckerle <[email protected]>
AuthorDate: Thu Apr 4 20:42:46 2019 -0400

    Publishing from 3a4708345a723b264aa6a666c67089c40d4c741a
---
 content/infoset/index.html | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/content/infoset/index.html b/content/infoset/index.html
index 85634df..6b6c392 100644
--- a/content/infoset/index.html
+++ b/content/infoset/index.html
@@ -534,16 +534,16 @@ but extended to handle all the XML 1.0 illegal characters 
including those
 with 16-bit codepoint values. This mapping is used bi-directionally, that is,
 illegal characters are replaced by their legal counterparts when parsing, and
 the reverse transformation is performed when unparsing, thereby allowing the
-creation of data containing the XML illegal characters from legal XML
+creation of data streams containing the XML illegal characters from legal XML
 documents that contain only the mapped PUA corresponding characters.</p>
 
 <p>These are the legal XML characters (for XML v1.0):</p>
 
-<div class="highlighter-rouge"><div class="highlight"><pre 
class="highlight"><code>#x0 | #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] 
| [#x10000-#x10FFFF] | #xD (treated specially)
+<div class="highlighter-rouge"><div class="highlight"><pre 
class="highlight"><code> #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | 
[#x10000-#x10FFFF] 
 </code></pre></div></div>
-
-<p>Illegal characters from <code class="highlighter-rouge">#x00</code> to 
<code class="highlighter-rouge">#x1F</code> are mapped to the PUA
-by adding <code class="highlighter-rouge">#xE000</code> to their character 
code.</p>
+<p>All other characters are illegal.
+Illegal characters from <code class="highlighter-rouge">#x00</code> to <code 
class="highlighter-rouge">#x1F</code> are mapped to the PUA
+by adding <code class="highlighter-rouge">#xE000</code> to their character 
code. Hence, the NUL (#x0) character code becomes #xE000.</p>
 
 <p>Illegal characters from <code class="highlighter-rouge">#xD800</code> to 
<code class="highlighter-rouge">#xDFFF</code> are mapped to the PUA by adding
 <code class="highlighter-rouge">#x1000</code> to their character code. So 
<code class="highlighter-rouge">#xD800</code> maps to <code 
class="highlighter-rouge">#xE800</code>, and
@@ -553,16 +553,18 @@ by adding <code class="highlighter-rouge">#xE000</code> 
to their character code.
 subtracting <code class="highlighter-rouge">#x0F00</code> from their character 
code, so to characters <code class="highlighter-rouge">#xF0FE</code>
 and <code class="highlighter-rouge">#xF0FF</code>.</p>
 
-<p>Character <code class="highlighter-rouge">#xD</code> (Carriage Return or 
CR) is mapped to <code class="highlighter-rouge">#xA</code> (Line Feed, or
+<p>The legal character <code class="highlighter-rouge">#xD</code> (Carriage 
Return or CR) is mapped to <code class="highlighter-rouge">#xA</code> (Line 
Feed, or
 LF). The CR character is allowed in the textual representation of XML
 documents, but is always converted to LF in the XML Infoset. That is, it is
 read by XML processors, but CRLF is converted to just LF, and CR alone is
 converted to LF. Daffodil is in a sense a different ‘reader’ of data into the
 XML infoset, so to be consistent with XML we map CR and CRLF to LF.</p>
 
-<p>It is a processing error when parsing if any DFDL infoset string contains
+<p>It is a processing error when parsing if the data-stream contains
 characters in the parts of the PUA used by this mapping for illegal XML
-codepoints.</p>
+codepoints. When unparsing, the characters such as #xE000 found in the infoset 
string values are mapped back to the corresponding illegal character code 
points (#xE000 becomes #x0, aka NUL).</p>
+
+<p>The XML for an infoset can embed the #xE000 character or any of the other 
“illegal” characters mapped into the PUA conveniently by use of XSD numeric 
character entities such as “”. This is turned into the #xE000 code point when 
the XML document is loaded. Daffodil will then map this when unparsing, to #x0 
(aka NUL).</p>
 
 <p>It is a processing error if any DFDL infoset string character is created 
with a
 character code greater than <code 
class="highlighter-rouge">#x10FFFF</code>.</p>

Reply via email to