Bug 6541476; State: 8-Fix Available, bug; Priority: 4-Low
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6541476

This is the second half od the issues mentioned in bug report 6541476.
While my previous post for this bug number concentrated on the metadata
class and the use of typesafe generics, this one here deals with the
reader and writer classes and how they handle UTF-8 data.

The current status of the reporitory does this:
1. translatedKeyword written in latin1 (writeBytes) but tried to read
   as modified UTF-8 prefixed with byte count (readUTF). The spec
   requires unmodified UTF-8 without byte count.
2. uncompressed text read and written as modified UTF-8 with byte count,
   in violation of the spec which requires unmodified UTF-8 and no byte
   count.
3. compressed text is stored and loaded as latin1. High bits are
   silently dropped when storing. The spec again requires unmodified
   UTF-8.

My patch atempts to fix these issues, and to verify them using a
suitable test case. As often there are multiple ways to do things. I'll
highlight a few of the choices I made, if you want to discuss them.

I use the Charset class as a central instrument for UTF-8 conversion.
I use nio buffers as input and output of its encode and decode methods.
This means that I have a bit more syntactic overhead than I would have
using simply String.getBytes(utf8) and friends. It also means that I
have the nio objects at hand if I ever chose to work on improving
performance.

Other than these buffer objects, I stick with the Stream API, not the
nio channels. I wrote a wrapper class to turn an ImageOutputStream into
an OutputStream, in order to layer a DeflatorOutputStream around it.

For writing from a ByteBuffer to an [Image]OutputStream, I constructed a
little helper to distinguish the cases of buffers with and without
backing array. This should benefit performance. In the other hand, in
the reader I simply copied code from the inflate method which reads
characters one at a time, probably terribly inefficient. One might get
better performance here at the cost of increased syntactic overhead.

In the Test I actually check the resulting byte sequence for the
uncompressed text chunk. This is done in order to rule out any symmetric
errors, which would allow correct read-back and still leave incorrect
data in the files. E.g. consistent use of modified UTF-8 would be such a
case. The compressed chunk I don't check exactly, just its header, as
the compression algorithm may yield different but equivalent results for
different implementations.

The attached patch is from my mercurial patch queue. Once you consider
it ready for inclusion, I will commit it locally and export a mercurial
patch instead.

Greetings,
 Martin von Gagern
diff --git a/src/share/classes/com/sun/imageio/plugins/png/PNGImageReader.java 
b/src/share/classes/com/sun/imageio/plugins/png/PNGImageReader.java
--- a/src/share/classes/com/sun/imageio/plugins/png/PNGImageReader.java
+++ b/src/share/classes/com/sun/imageio/plugins/png/PNGImageReader.java
@@ -38,8 +38,11 @@
 import java.io.ByteArrayInputStream;
 import java.io.DataInputStream;
 import java.io.InputStream;
+import java.io.InputStreamReader;
 import java.io.IOException;
 import java.io.SequenceInputStream;
+import java.nio.ByteBuffer;
+import java.nio.charset.Charset;
 import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.Enumeration;
@@ -453,18 +456,41 @@
         String languageTag = readNullTerminatedString();
         metadata.iTXt_languageTag.add(languageTag);
 
-        String translatedKeyword = stream.readUTF();
+        // Read translated keyword and text in one go.
+        // Both are (unmodified) UTF-8, and the former is null-terminated
+        long pos = stream.getStreamPosition();
+        byte[] b = new byte[(int)(chunkStart + chunkLength - pos)];
+        stream.readFully(b);
+
+        Charset utf8 = Charset.forName("UTF-8");
+        int nullPos;
+        for (nullPos = 0; nullPos < b.length; ++nullPos) {
+            if (b[nullPos] == 0) break;
+        }
+        if (nullPos == b.length) {
+            throw new IIOException("Malformed iTXt chunk");
+        }
+
+        ByteBuffer byteBuf = ByteBuffer.wrap(b, 0, nullPos);
+        String translatedKeyword = utf8.decode(byteBuf).toString();
         metadata.iTXt_translatedKeyword.add(translatedKeyword);
-        stream.skipBytes(1); // Null separator
+        ++nullPos;
 
         String text;
         if (compressionFlag == 1) { // Decompress the text
-            long pos = stream.getStreamPosition();
-            byte[] b = new byte[(int)(chunkStart + chunkLength - pos)];
-            stream.readFully(b);
-            text = inflate(b);
+            InputStream bais = new ByteArrayInputStream(b, nullPos,
+                                                        b.length - nullPos);
+            InputStream iis = new InflaterInputStream(bais);
+            InputStreamReader isr = new InputStreamReader(iis, utf8);
+            StringBuilder sb = new StringBuilder(80);
+            int c;
+            while ((c = isr.read()) != -1) {
+                sb.append((char)c);
+            }
+            text = sb.toString();
         } else {
-            text = stream.readUTF();
+            byteBuf = ByteBuffer.wrap(b, nullPos, b.length - nullPos);
+            text = utf8.decode(byteBuf).toString();
         }
         metadata.iTXt_text.add(text);
     }
diff --git a/src/share/classes/com/sun/imageio/plugins/png/PNGImageWriter.java 
b/src/share/classes/com/sun/imageio/plugins/png/PNGImageWriter.java
--- a/src/share/classes/com/sun/imageio/plugins/png/PNGImageWriter.java
+++ b/src/share/classes/com/sun/imageio/plugins/png/PNGImageWriter.java
@@ -36,6 +36,8 @@
 import java.io.DataOutput;
 import java.io.IOException;
 import java.io.OutputStream;
+import java.nio.ByteBuffer;
+import java.nio.charset.Charset;
 import java.util.Iterator;
 import java.util.Locale;
 import java.util.zip.Deflater;
@@ -268,6 +270,40 @@
         this.locale = locale;
     }
 }
+
+/**
+ * Adapter to turn an ImageOutputStream into an OutputStream.
+ */
+class ImageAdapterOutputStream extends OutputStream {
+
+    final ImageOutputStream out;
+
+    public ImageAdapterOutputStream(ImageOutputStream out) {
+        this.out = out;
+    }
+
+    public void write(int b) throws IOException {
+        out.write(b);
+    }
+
+    public void write(byte[] b) throws IOException {
+        out.write(b);
+    }
+
+    public void write(byte[] b, int off, int len) throws IOException {
+        out.write(b, off, len);
+    }
+
+    public void flush() throws IOException {
+        out.flush();
+    }
+
+    public void close() throws IOException {
+        out.close();
+    }
+
+}
+
 
 /**
  */
@@ -688,6 +724,7 @@
         Iterator translatedKeywordIter =
             metadata.iTXt_translatedKeyword.iterator();
         Iterator textIter = metadata.iTXt_text.iterator();
+        Charset utf8 = Charset.forName("UTF-8");
 
         while (keywordIter.hasNext()) {
             ChunkStream cs = new ChunkStream(PNGImageReader.iTXt_TYPE, stream);
@@ -705,16 +742,35 @@
             cs.writeByte(0);
 
             String translatedKeyword = (String)translatedKeywordIter.next();
-            cs.writeBytes(translatedKeyword);
+            ImageAdapterOutputStream iaos = new ImageAdapterOutputStream(cs);
+            write(iaos, utf8.encode(translatedKeyword));
             cs.writeByte(0);
 
             String text = (String)textIter.next();
+            ByteBuffer textBytes = utf8.encode(text);
             if (flag == 1) {
-                cs.write(deflate(text));
-            } else {
-                cs.writeUTF(text);
+                DeflaterOutputStream dos = new DeflaterOutputStream(iaos);
+                write(dos, textBytes);
+                dos.finish();
+            }
+            else {
+                write(iaos, textBytes);
             }
             cs.finish();
+        }
+    }
+
+    private static void write(OutputStream out, ByteBuffer buf)
+        throws IOException
+    {
+        if (buf.hasArray()) {
+            out.write(buf.array(),
+                      buf.arrayOffset()+buf.position(), buf.remaining());
+        }
+        else {
+            byte[] b = new byte[buf.remaining()];
+            buf.get(b);
+            out.write(b);
         }
     }
 
diff --git a/test/javax/imageio/plugins/png/ItxtUtf8Test.java 
b/test/javax/imageio/plugins/png/ItxtUtf8Test.java
new file mode 100644
--- /dev/null
+++ b/test/javax/imageio/plugins/png/ItxtUtf8Test.java
@@ -0,0 +1,188 @@
+/**
+ * @test
+ * @bug 6541476
+ * @summary Write and read a PNG file including an non-latin1 iTXt chunk
+ */
+
+import java.awt.image.BufferedImage;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.OutputStream;
+import javax.imageio.IIOImage;
+import javax.imageio.ImageIO;
+import javax.imageio.ImageReader;
+import javax.imageio.ImageTypeSpecifier;
+import javax.imageio.ImageWriter;
+import javax.imageio.metadata.IIOMetadata;
+import javax.imageio.stream.ImageInputStream;
+import javax.imageio.stream.ImageOutputStream;
+import javax.imageio.stream.MemoryCacheImageInputStream;
+import javax.imageio.stream.MemoryCacheImageOutputStream;
+import org.w3c.dom.DOMImplementation;
+import org.w3c.dom.Document;
+import org.w3c.dom.Element;
+import org.w3c.dom.Node;
+import org.w3c.dom.bootstrap.DOMImplementationRegistry;
+
+public class ItxtUtf8Test {
+
+    public static final String
+    TEXT = "\u24c9\u24d4\u24e7\u24e3" + // circled letters from BMP
+      "\ud835\udc13\ud835\udc1e\ud835\udc31\ud835\udc2d" + // from other plane
+      "\u24c9\u24d4\u24e7\u24e3", // a repetition for compression
+    VERBATIM = "\u24e5\u24d4\u24e1\u24d1\u24d0\u24e3\u24d8\u24dc",
+    COMPRESSED = 
"\u24d2\u24de\u24dc\u24df\u24e1\u24d4\u24e2\u24e2\u24d4\u24d3";
+
+    public static final byte[]
+    VBYTES = {
+        (byte)0x00, (byte)0x00, (byte)0x00, (byte)0x56, // chunk length
+        (byte)0x69, (byte)0x54, (byte)0x58, (byte)0x74, // chunk type "iTXt"
+        (byte)0x76, (byte)0x65, (byte)0x72, (byte)0x62,
+        (byte)0x61, (byte)0x74, (byte)0x69, (byte)0x6d, // keyword "verbatim"
+        (byte)0x00, // separator terminating keyword
+        (byte)0x00, // compression flag
+        (byte)0x00, // compression method, must be zero
+        (byte)0x78, (byte)0x2d, (byte)0x63, (byte)0x69,
+        (byte)0x72, (byte)0x63, (byte)0x6c, (byte)0x65,
+        (byte)0x64, // language tag "x-circled"
+        (byte)0x00, // separator terminating language tag
+        (byte)0xe2, (byte)0x93, (byte)0xa5, // '\u24e5'
+        (byte)0xe2, (byte)0x93, (byte)0x94, // '\u24d4'
+        (byte)0xe2, (byte)0x93, (byte)0xa1, // '\u24e1'
+        (byte)0xe2, (byte)0x93, (byte)0x91, // '\u24d1'
+        (byte)0xe2, (byte)0x93, (byte)0x90, // '\u24d0'
+        (byte)0xe2, (byte)0x93, (byte)0xa3, // '\u24e3'
+        (byte)0xe2, (byte)0x93, (byte)0x98, // '\u24d8'
+        (byte)0xe2, (byte)0x93, (byte)0x9c, // '\u24dc'
+        (byte)0x00, // separator terminating the translated keyword
+        (byte)0xe2, (byte)0x93, (byte)0x89, // '\u24c9'
+        (byte)0xe2, (byte)0x93, (byte)0x94, // '\u24d4'
+        (byte)0xe2, (byte)0x93, (byte)0xa7, // '\u24e7'
+        (byte)0xe2, (byte)0x93, (byte)0xa3, // '\u24e3'
+        (byte)0xf0, (byte)0x9d, (byte)0x90, (byte)0x93, // '\ud835\udc13'
+        (byte)0xf0, (byte)0x9d, (byte)0x90, (byte)0x9e, // '\ud835\udc1e'
+        (byte)0xf0, (byte)0x9d, (byte)0x90, (byte)0xb1, // '\ud835\udc31'
+        (byte)0xf0, (byte)0x9d, (byte)0x90, (byte)0xad, // '\ud835\udc2d'
+        (byte)0xe2, (byte)0x93, (byte)0x89, // '\u24c9'
+        (byte)0xe2, (byte)0x93, (byte)0x94, // '\u24d4'
+        (byte)0xe2, (byte)0x93, (byte)0xa7, // '\u24e7'
+        (byte)0xe2, (byte)0x93, (byte)0xa3, // '\u24e3'
+        (byte)0xb5, (byte)0xcc, (byte)0x97, (byte)0x56 // CRC
+    },
+    CBYTES = {
+        // we don't want to check the chunk length,
+        // as this might depend on implementation.
+        (byte)0x69, (byte)0x54, (byte)0x58, (byte)0x74, // chunk type "iTXt"
+        (byte)0x63, (byte)0x6f, (byte)0x6d, (byte)0x70,
+        (byte)0x72, (byte)0x65, (byte)0x73, (byte)0x73,
+        (byte)0x65, (byte)0x64, // keyword "compressed"
+        (byte)0x00, // separator terminating keyword
+        (byte)0x01, // compression flag
+        (byte)0x00, // compression method, 0=deflate
+        (byte)0x78, (byte)0x2d, (byte)0x63, (byte)0x69,
+        (byte)0x72, (byte)0x63, (byte)0x6c, (byte)0x65,
+        (byte)0x64, // language tag "x-circled"
+        (byte)0x00, // separator terminating language tag
+        // we don't want to check the actual compressed data,
+        // as this might depend on implementation.
+    };
+/*
+*/
+
+    public static void main(String[] args) throws Exception {
+        String format = "javax_imageio_png_1.0";
+        BufferedImage img =
+            new BufferedImage(16, 16, BufferedImage.TYPE_INT_RGB);
+        ImageWriter iw = ImageIO.getImageWritersByMIMEType("image/png").next();
+        ByteArrayOutputStream os = new ByteArrayOutputStream();
+        ImageOutputStream ios = new MemoryCacheImageOutputStream(os);
+        iw.setOutput(ios);
+        IIOMetadata meta =
+            iw.getDefaultImageMetadata(new ImageTypeSpecifier(img), null);
+        DOMImplementationRegistry registry;
+        registry = DOMImplementationRegistry.newInstance();
+        DOMImplementation impl = registry.getDOMImplementation("XML 3.0");
+        Document doc = impl.createDocument(null, format, null);
+        Element root, itxt, entry;
+        root = doc.getDocumentElement();
+        root.appendChild(itxt = doc.createElement("iTXt"));
+        itxt.appendChild(entry = doc.createElement("iTXtEntry"));
+        entry.setAttribute("keyword", "verbatim");
+        entry.setAttribute("compressionFlag", "false");
+        entry.setAttribute("compressionMethod", "0");
+        entry.setAttribute("languageTag", "x-circled");
+        entry.setAttribute("translatedKeyword", VERBATIM);
+        entry.setAttribute("text", TEXT);
+        itxt.appendChild(entry = doc.createElement("iTXtEntry"));
+        entry.setAttribute("keyword", "compressed");
+        entry.setAttribute("compressionFlag", "true");
+        entry.setAttribute("compressionMethod", "0");
+        entry.setAttribute("languageTag", "x-circled");
+        entry.setAttribute("translatedKeyword", COMPRESSED);
+        entry.setAttribute("text", TEXT);
+        meta.mergeTree(format, root);
+        iw.write(new IIOImage(img, null, meta));
+        iw.dispose();
+
+        byte[] bytes = os.toByteArray();
+        if (args.length == 1 && "-dump".equals(args[0]))
+            System.out.write(bytes);
+        if (findBytes(VBYTES, bytes) < 0)
+            throw new AssertionError("verbatim block not found");
+        if (findBytes(CBYTES, bytes) < 0)
+            throw new AssertionError("compressed block not found");
+
+        ImageReader ir = ImageIO.getImageReader(iw);
+        ByteArrayInputStream is = new ByteArrayInputStream(bytes);
+        ImageInputStream iis = new MemoryCacheImageInputStream(is);
+        ir.setInput(iis);
+        meta = ir.getImageMetadata(0);
+        Node node = meta.getAsTree(format);
+        for (node = node.getFirstChild();
+             !"iTXt".equals(node.getNodeName());
+             node = node.getNextSibling());
+        boolean verbatimSeen = false, compressedSeen = false;
+        for (node = node.getFirstChild();
+             node != null;
+             node = node.getNextSibling()) {
+            entry = (Element)node;
+            String keyword = entry.getAttribute("keyword");
+            String translatedKeyword = entry.getAttribute("translatedKeyword");
+            String text = entry.getAttribute("text");
+            if ("verbatim".equals(keyword)) {
+                if (verbatimSeen) throw new AssertionError("Duplicate");
+                verbatimSeen = true;
+                if (!VERBATIM.equals(translatedKeyword))
+                    throw new AssertionError("Wrong translated keyword");
+                if (!TEXT.equals(text))
+                    throw new AssertionError("Wrong text");
+            }
+            else if ("compressed".equals(keyword)) {
+                if (compressedSeen) throw new AssertionError("Duplicate");
+                compressedSeen = true;
+                if (!COMPRESSED.equals(translatedKeyword))
+                    throw new AssertionError("Wrong translated keyword");
+                if (!TEXT.equals(text))
+                    throw new AssertionError("Wrong text");
+            }
+            else {
+                throw new AssertionError("Unexpected keyword");
+            }
+        }
+        if (!(verbatimSeen && compressedSeen))
+            throw new AssertionError("Missing chunk");
+    }
+
+    private static final int findBytes(byte[] needle, byte[] haystack) {
+        HAYSTACK: for (int h = 0; h <= haystack.length - needle.length; ++h) {
+            for (int n = 0; n < needle.length; ++n) {
+                if (needle[n] != haystack[h + n]) {
+                    continue HAYSTACK;
+                }
+            }
+            return h;
+        }
+        return -1;
+    }
+
+}

Reply via email to