[PATCH] JDK-8216140: Correct UnicodeDecoder U+FFFE handling

Giovanni Gatti Pinheiro Wed, 09 Jan 2019 09:21:56 -0800

Hello,

I’ve crossed this bug on the past and I would like to fix it. You will find
the patch with the fix attached to this message.


I have few questions around this subject.

1.    Where should the test be placed?
It is not clear to me which is the standard approach. I’ve spent about 1h
searching in the code base where it should be placed without success. Then
I’ve tried to « fix » the code and to run nio tests expecting to break
something (however no test failed). The closest I’ve got from a reasonable
place is TestUTF_16 class. In this class, there is a test for « Reversed
BOM in middle of stream Negative test. » `that is commented out, which is
the opposite to what I’m tried to do.

2.    What to do with UTF-8/UTF-32?
I’ve tested UTF-8/UTF-32 to see how these two implementations handle U+FFFE
in the middle of a byte stream. They are both compliant with Unicode
specification and it looks like that this bug applies only to UTF-16’s
implementation. It’s awkward that these three encodings do not behave the
same way, so I would like to confirm with you that I don’t have to do
anything special about it.

3.    What exactly should I test?
Technically I should’ve tested that all Unicode non-characters are not
reported as malformed input. Do I have to go that far or just testing
U+FFFE is enough?

4.    Do I have to sign OCA document to this contribution?
It’s really a small fix and I don’t really care about any credit. But
still, if I must, just let me know and I will do it ASAP.

So, let me know what to do exactly and I will do take care about it.

Thank you in advance.

Best regards,

Giovanni GATTI PINHEIRO

diff --git a/src/java.base/share/classes/sun/nio/cs/UnicodeDecoder.java b/src/java.base/share/classes/sun/nio/cs/UnicodeDecoder.java
index c3509d7..e6bcef7 100644
--- a/src/java.base/share/classes/sun/nio/cs/UnicodeDecoder.java
+++ b/src/java.base/share/classes/sun/nio/cs/UnicodeDecoder.java
@@ -91,11 +91,6 @@ abstract class UnicodeDecoder extends CharsetDecoder {
 
                 char c = decode(b1, b2);
 
-                if (c == REVERSED_MARK) {
-                    // A reversed BOM cannot occur within middle of stream
-                    return CoderResult.malformedForLength(2);
-                }
-
                 // Surrogates
                 if (Character.isSurrogate(c)) {
                     if (Character.isHighSurrogate(c)) {
diff --git a/test/jdk/sun/nio/cs/TestUTF_16.java b/test/jdk/sun/nio/cs/TestUTF_16.java
index 25344dd..a3f3e2d 100644
--- a/test/jdk/sun/nio/cs/TestUTF_16.java
+++ b/test/jdk/sun/nio/cs/TestUTF_16.java
@@ -184,7 +184,24 @@ public class TestUTF_16 {
            throw new Exception ("Incorrectly parsed BOM in middle of input");
         */
 
-            // Fixed included with bug 4403848 fixes buffer sizing
+        // Test 7: Decoding does not report unicode non-character (U+FFFE)
+        if (StandardCharsets.UTF_16LE.newDecoder()
+                .onMalformedInput(CodingErrorAction.REPORT)
+                .decode(ByteBuffer.allocate(6)
+                                .put(new byte[]
+                                        {(byte) 0x61, (byte) 0x00,
+                                                (byte) 0xfe, (byte) 0xff,
+                                                (byte) 0x64, (byte) 0x00})
+                                .flip(),
+                        CharBuffer.allocate(3),
+                        true)
+                .isMalformed()) {
+
+            throw new Exception("REGTEST TestUTF16 non-character U+FFFE test failed");
+        }
+
+
+        // Fixed included with bug 4403848 fixes buffer sizing
             // issue due to non provision of additional 2 bytes
             // headroom for initial BOM bytes for UTF-16 encoding.
           System.err.println ("OVERALL PASS OF UTF-16 Test");

[PATCH] JDK-8216140: Correct UnicodeDecoder U+FFFE handling

Reply via email to