java.xml) 8207760: SAXException: Invalid UTF-16 surrogate detected: d83c ?

Daniel Fuchs Fri, 14 Sep 2018 03:00:10 -0700

Hi Joe,

Thanks for doing that - it's a far better solution.


ToHTMLStream.java:

1432    // move the index if the surrogate pair has been written.
1433    if (writeUTF16Surrogate(ch, chars, i, end) >= 0) {
1434        i++;
1435    }

shouldn't this be:

1433    if (writeUTF16Surrogate(ch, chars, i, end) >= 0) {
            if (Encodings.isHighUTF16Surrogate(ch)) {
                // two input characters processed, increase
                // the index again.
                i++;
            }

IIUC you only want to increase the index if the ch was the
high surrogate and the function has advanced to the low
surrogate?

I mean - a codepoint could have been returned if ch was the
low surrogate, and in that case you don't want to increase
the index twice as only one character has been consumed.

I guess there's the same issue in ToStream.java at lines
1154-1156 and in ToTextStream.java at line 303...

Or am I missing something?

best regards,

-- daniel


On 14/09/2018 04:13, Joe Wang wrote:

Thanks Daniel.
I changed the return of writeUTF16Surrogate so that it is clearer withinwriteUTF16Surrogate when an additional index increment is needed. Othercorresponding changes are in ToHTMLStream and ToTextStream where similarcalls to the method are made. It's been an issue in some part of JAXPimpl where error or warning messages are printed out to the console(e.g. JDK-8000621). But I kept it as is in ToTextStream for this patch.
webrev01:
http://cr.openjdk.java.net/~joehw/jdk12/8207760/webrev01/

Best,
Joe

On 9/13/18, 2:23 AM, Daniel Fuchs wrote:
Hi Joe,

On 13/09/2018 00:25, Lance Andersen wrote:
Hi Joe,

The change  seems reasonable
Agreed. However the following condition in ToStream::handleEscaping
is a bit cryptic:

1155             if ((ihs && (i + 1 < end)) || (ils && i != 0)) {
1156                 i++ ; // process two input characters
1157             }

could the comment be fleshed out to explain it?

I suspect that: `(ihs && (i + 1 < end))` means that
`writeUTF16Surrogate(c, ch, i, end);` has written the two surrogate, in
which case i should be incremented in order to skip the low surrogate
which has just been written.

I am not sure what `(ils && i != 0)` means, though...

best regards

-- daniel
On Sep 12, 2018, at 2:11 PM, Joe Wang <huizhe.w...@oracle.com> wrote:

Hi,
Please review a patch for a situation where a surrogate pair is atthe edge of a buffer. What the existing impl did was to report it asan error. This patch fixes it by caching the high surrogate andprints it out along with the low surrogate. Similar issue existsalso in the CDATA section and is fixed in this patch. The CDATA implhad a couple of bugs where an indent could be written inside theCDATA and an unicode character written in between two CDATAsections. Both are fixed in this patch.
JBS: https://bugs.openjdk.java.net/browse/JDK-8207760
webrevs: http://cr.openjdk.java.net/~joehw/jdk12/8207760/webrev/

Thanks,
Joe
<http://oracle.com/us/design/oracle-email-sig-198324.gif>
<http://oracle.com/us/design/oracle-email-sig-198324.gif><http://oracle.com/us/design/oracle-email-sig-198324.gif><http://oracle.com/us/design/oracle-email-sig-198324.gif>LanceAndersen| Principal Member of Technical Staff | +1.781.442.2037
Oracle Java Engineering
1 Network Drive
Burlington, MA 01803
lance.ander...@oracle.com <mailto:lance.ander...@oracle.com>

Re: RFR(JDK12/JAXP/java.xml) 8207760: SAXException: Invalid UTF-16 surrogate detected: d83c ?

Reply via email to