[ 
https://issues.apache.org/jira/browse/XALANJ-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16774590#comment-16774590
 ] 

Jason Harrop edited comment on XALANJ-2419 at 2/21/19 11:31 PM:
----------------------------------------------------------------

When I run the smoketest under Java 8, all tests pass.

When I compile using Java 11 and run the smoketest, for testcase2 I get:

{{
<checkresult result="Pass" desc="Simple characters should come out unscathed"/>
<checkresult result="Pass" desc="Simple characters should come out unscathed 
(as attribute value)"/>
<checkresult result="Fail" desc="ISO-8859-1 characters should come out as 
entities"/>
<checkresult result="Fail" desc="ISO-8859-1 characters should come out as 
entities (as attribute value)"/>
<checkresult result="Pass" desc="BMP characters should come out as entities"/>
<checkresult result="Pass" desc="BMP characters should come out as entities (as 
attribute value)"/>
<checkresult result="Pass" desc="String with two astral characters and one in 
the BMP should have length of 5 UTF-16 code units"/>
<checkresult result="Fail" desc="Astral characters should come out as 
entities"/>
<checkresult result="Fail" desc="Astral characters should come out as entities 
(as attribute value)"/>
}}
 


was (Author: jharrop):
When I run the smoketest under Java 8, all tests pass.

When I compile using Java 11 and run the smoketest, for testcase2 I get:

{{<checkresult result="Pass" desc="Simple characters should come out 
unscathed"/>}}
{{ <checkresult result="Pass" desc="Simple characters should come out unscathed 
(as attribute value)"/>}}{{<checkresult result="Fail" desc="ISO-8859-1 
characters should come out as entities"/>}}
{{ <checkresult result="Fail" desc="ISO-8859-1 characters should come out as 
entities (as attribute value)"/>}}{{<checkresult result="Pass" desc="BMP 
characters should come out as entities"/>}}
{{ <checkresult result="Pass" desc="BMP characters should come out as entities 
(as attribute value)"/>}}
{{ <checkresult result="Pass" desc="String with two astral characters and one 
in the BMP should have length of 5 UTF-16 code units"/>}}{{<checkresult 
result="Fail" desc="Astral characters should come out as entities"/>}}
{{ <checkresult result="Fail" desc="Astral characters should come out as 
entities (as attribute value)"/>}}

 

> Astral characters written as a pair of NCRs with the surrogate scalar values 
> when using UTF-8
> ---------------------------------------------------------------------------------------------
>
>                 Key: XALANJ-2419
>                 URL: https://issues.apache.org/jira/browse/XALANJ-2419
>             Project: XalanJ2
>          Issue Type: Bug
>          Components: Serialization
>    Affects Versions: 2.7.1
>            Reporter: Henri Sivonen
>            Priority: Major
>         Attachments: XALANJ-2419-fix-v3.txt, XALANJ-2419-tests-v3.txt
>
>
> org.apache.xml.serializer.ToStream contains the following code:
>                     else if (m_encodingInfo.isInEncoding(ch)) {
>                         // If the character is in the encoding, and
>                         // not in the normal ASCII range, we also
>                         // just leave it get added on to the clean characters
>                         
>                     }
>                     else {
>                         // This is a fallback plan, we should never get here
>                         // but if the character wasn't previously handled
>                         // (i.e. isn't in the encoding, etc.) then what
>                         // should we do?  We choose to write out an entity
>                         writeOutCleanChars(chars, i, lastDirtyCharProcessed);
>                         writer.write("&#");
>                         writer.write(Integer.toString(ch));
>                         writer.write(';');
>                         lastDirtyCharProcessed = i;
>                     }
> This leads to the wrong (latter) if branch running for surrogates, because 
> isInEncoding() for UTF-8 returns false for surrogates. It is always wrong 
> (regardless of encoding) to escape a surrogate as an NCR.
> The practical effect of this bug is that any document with astral characters 
> in it ends up in an ill-formed serialization and does not parse back using an 
> XML parser.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org
For additional commands, e-mail: dev-h...@xalan.apache.org

Reply via email to