Find the proposed patch attached. Some comments and explanations, here:

There is a quite interesting implementation in Manifest and Attributes
worth quite some explanation. The way it used to work before was:

1. put manifest header name, colon and space into a StringBuffer
-> the buffer now contains a string of characters each high-byte of
which is zero as explained later why this is important. the high-bytes
are zero because the set of allowed characters is very limited to ":",
" ", "a" - "z", "A" - "Z", "0" - "9", "_", and "-" according to
Attributes.Name#hash(String) so far with only the name and the
separator and yet without the values.

2. if the value is not null, encode it in UTF-8 into a byte array and
instantiate a String with it using deprecated
String#String(byte[],int,int,int) resulting in a String with the same
length as the byte array before holding one byte in each character's
low-byte. This makes a difference for characters encoded with more than
one byte in UTF-8. The new String is potentially longer than the
original value.

3. if the value is not null, append value to buffer. The one UTF-8
encoded byte per character from the appended string is preserved also
in the buffer along with the previous buffer contents.

3alt. if the value is null, add "null" to the buffer. See
java.lang.AbstractStringBuilder#append(String). Neither of the
characters of "null" has a non-zero high-byte encoded as UTF-16 chars.

4. make72Safe inserts line breaks with continuation spaces. Note that
the buffer here contains only one byte per character because all high-
bytes are still zero so that line.length() and line.insert(index, ...)
effectively operate with byte offsets and not characters.

5. buffer.toString()

6. DataOutputStream#writeBytes(String). First of all read the JavaDoc
comment for it, which explains it all:
    Writes out the string to the underlying output stream as a
    sequence of bytes. Each character in the string is written out, in
    sequence, by discarding its high eight bits. If no exception is
    thrown, the counter <code>written</code> is incremented by the
    length of <code>s</code>
This restores the earlier UTF-8 encoding correctly.

The topic has been discussed and mentioned already in
http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-May/052946.ht
ml
https://bugs.openjdk.java.net/browse/JDK-6202130

String(byte[],int,int,int) works "well" or "well enough" only together
with DataOutputStream#writeBytes(String). When removing
String(byte[],int,int,int) from Manifest and Attributes because
deprecated, it makes no sense to keep using
DataOutputStream#writeBytes(String) either.

For the same reason as String#String(byte[],int,int,int) has been
deprecated, I suggest to also deprecate
java.io.DataOutput#writeBytes(String) as a separate issue. This might
relate to https://bugs.openjdk.java.net/browse/JDK-6400767 but that one
came to a different conclusion some ten years ago.

I preferred to stick with the DataOutputStream even though not strictly
necessary any more. It is and has been in the API of Attributes
(unfortunately not private) and should better not be removed by
changing the parameter type. Same for Manifest#make72Safe(StringBuffer)
which I deprecated rather than having removed. Someone could have
extended a class from Manifest and use such a method and when changing
the signature it could no longer even compile in a far-fetched case.

LINE_BREAK, CONTINUATION_SPACE, LINE_BREAK_BYTES, and
LINE_BREAK_WITH_CONTINUATION_SPACE_BYTES should prevent having to
invoke getBytes(UTF_8) over and over again on "\r\n" and "\r\n " with
the idea to slightly improve performance this way. I figured it does
not need JavaDoc comments but would be happy to add them if desired.

I removed "XXX Need to handle UTF8 values." from Manifest#read after
adding a test for it in ValueUtf8Coding. This change and test also
relate to bug 6202130 but does not solve that one completely.
ValueUtf8Coding demonstrates that Manifest can read UTF-8 encoded
values which is a necessary test case to cover for this patch here.
ValueUtf8Coding is the same test as already submitted and suggested
earlier. See
http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-October/threa
d.html#55848

Indentation in Attributes#write(DataOutputStream) was five spaces on
most lines. I fixed indentation only on the lines changed anyway.

I replaced String#String(byte[],int,int,String) with
String#String(byte[],int,int,java.nio.charset.StandardCharsets.UTF_8)
which as a difference does not declare to throw a
java.io.UnsupportedEncodingException. That also replaced "UTF8" as a
charset name which I would consider not optimal regarding
sun.nio.cs.UTF_8#UTF_8() and sun.nio.cs.UTF_8#historicalName().

In my opinion there is still some duplicated or at least very similar
code in Manifest#write, Attributes#writeMain, and Attributes#write but
I preferred to change less rather than more and not to further refactor
and re-combine it.

In EmptyKeysAndValues and NullKeysAndValues tests I tried to
demonstrate that the changed implementation does not change behaviour
also in edge cases. I would have expected not having to test all these
cases but then I realized it was possible to test and is therefore
possible in a real use case as well however far-fetched. At least the

    if (value != null) {

lines (three times) most obviously demand to test the null value cases.

I'm looking curiously forward to any kind of feedback or opinion.

Philipp
diff -r 54aafb3ba9ab src/java.base/share/classes/java/util/jar/Attributes.java
--- a/src/java.base/share/classes/java/util/jar/Attributes.java	Mon Sep 24 22:12:07 2018 -0700
+++ b/src/java.base/share/classes/java/util/jar/Attributes.java	Sat Dec 01 12:11:06 2018 +0100
@@ -36,6 +36,8 @@
 
 import sun.util.logging.PlatformLogger;
 
+import static java.nio.charset.StandardCharsets.UTF_8;
+
 /**
  * The Attributes class maps Manifest attribute names to associated string
  * values. Valid attribute names are case-insensitive, are restricted to
@@ -298,25 +300,15 @@
      * Writes the current attributes to the specified data output stream.
      * XXX Need to handle UTF8 values and break up lines longer than 72 bytes
      */
-     @SuppressWarnings("deprecation")
-     void write(DataOutputStream os) throws IOException {
+    void write(DataOutputStream out) throws IOException {
          for (Entry<Object, Object> e : entrySet()) {
              StringBuffer buffer = new StringBuffer(
                                          ((Name) e.getKey()).toString());
              buffer.append(": ");
-
-             String value = (String) e.getValue();
-             if (value != null) {
-                 byte[] vb = value.getBytes("UTF8");
-                 value = new String(vb, 0, 0, vb.length);
-             }
-             buffer.append(value);
-
-             Manifest.make72Safe(buffer);
-             buffer.append("\r\n");
-             os.writeBytes(buffer.toString());
+            buffer.append((String) e.getValue());
+            Manifest.write72(out, buffer.toString());
          }
-        os.writeBytes("\r\n");
+        Manifest.write72(out, "");
     }
 
     /*
@@ -326,7 +318,6 @@
      *
      * XXX Need to handle UTF8 values and break up lines longer than 72 bytes
      */
-    @SuppressWarnings("deprecation")
     void writeMain(DataOutputStream out) throws IOException
     {
         // write out the *-Version header first, if it exists
@@ -338,7 +329,7 @@
         }
 
         if (version != null) {
-            out.writeBytes(vername+": "+version+"\r\n");
+            out.write((vername + ": " + version + "\r\n").getBytes(UTF_8));
         }
 
         // write out all attributes except for the version
@@ -346,30 +337,18 @@
         for (Entry<Object, Object> e : entrySet()) {
             String name = ((Name) e.getKey()).toString();
             if ((version != null) && !(name.equalsIgnoreCase(vername))) {
-
                 StringBuffer buffer = new StringBuffer(name);
                 buffer.append(": ");
-
-                String value = (String) e.getValue();
-                if (value != null) {
-                    byte[] vb = value.getBytes("UTF8");
-                    value = new String(vb, 0, 0, vb.length);
-                }
-                buffer.append(value);
-
-                Manifest.make72Safe(buffer);
-                buffer.append("\r\n");
-                out.writeBytes(buffer.toString());
+                buffer.append((String) e.getValue());
+                Manifest.write72(out, buffer.toString());
             }
         }
-        out.writeBytes("\r\n");
+        Manifest.write72(out, "");
     }
 
     /*
      * Reads attributes from the specified input stream.
-     * XXX Need to handle UTF8 values.
      */
-    @SuppressWarnings("deprecation")
     void read(Manifest.FastInputStream is, byte[] lbuf) throws IOException {
         String name = null, value;
         byte[] lastline = null;
@@ -401,7 +380,7 @@
                     lastline = buf;
                     continue;
                 }
-                value = new String(buf, 0, buf.length, "UTF8");
+                value = new String(buf, 0, buf.length, UTF_8);
                 lastline = null;
             } else {
                 while (lbuf[i++] != ':') {
@@ -412,13 +391,13 @@
                 if (lbuf[i++] != ' ') {
                     throw new IOException("invalid header field");
                 }
-                name = new String(lbuf, 0, 0, i - 2);
+                name = new String(lbuf, 0, i - 2, UTF_8);
                 if (is.peek() == ' ') {
                     lastline = new byte[len - i];
                     System.arraycopy(lbuf, i, lastline, 0, len - i);
                     continue;
                 }
-                value = new String(lbuf, i, len - i, "UTF8");
+                value = new String(lbuf, i, len - i, UTF_8);
             }
             try {
                 if ((putValue(name, value) != null) && (!lineContinued)) {
diff -r 54aafb3ba9ab src/java.base/share/classes/java/util/jar/Manifest.java
--- a/src/java.base/share/classes/java/util/jar/Manifest.java	Mon Sep 24 22:12:07 2018 -0700
+++ b/src/java.base/share/classes/java/util/jar/Manifest.java	Sat Dec 01 12:11:06 2018 +0100
@@ -34,6 +34,8 @@
 import java.util.HashMap;
 import java.util.Iterator;
 
+import static java.nio.charset.StandardCharsets.UTF_8;
+
 /**
  * The Manifest class is used to maintain Manifest entry names and their
  * associated Attributes. There are main Manifest Attributes as well as
@@ -136,14 +138,14 @@
 
     /**
      * Writes the Manifest to the specified OutputStream.
-     * Attributes.Name.MANIFEST_VERSION must be set in
+     * {@link Attributes.Name.MANIFEST_VERSION} or
+     * {@link Attributes.Name.SIGNATURE_VERSION} must be set in
      * MainAttributes prior to invoking this method.
      *
      * @param out the output stream
      * @exception IOException if an I/O error has occurred
      * @see #getMainAttributes
      */
-    @SuppressWarnings("deprecation")
     public void write(OutputStream out) throws IOException {
         DataOutputStream dos = new DataOutputStream(out);
         // Write out the main attributes for the manifest
@@ -151,15 +153,8 @@
         // Now write out the per-entry attributes
         for (Map.Entry<String, Attributes> e : entries.entrySet()) {
             StringBuffer buffer = new StringBuffer("Name: ");
-            String value = e.getKey();
-            if (value != null) {
-                byte[] vb = value.getBytes("UTF8");
-                value = new String(vb, 0, 0, vb.length);
-            }
-            buffer.append(value);
-            make72Safe(buffer);
-            buffer.append("\r\n");
-            dos.writeBytes(buffer.toString());
+            buffer.append(e.getKey());
+            write72(dos, buffer.toString());
             e.getValue().write(dos);
         }
         dos.flush();
@@ -167,7 +162,10 @@
 
     /**
      * Adds line breaks to enforce a maximum 72 bytes per line.
+     *
+     * @deprecation Replaced with {@link #write72}.
      */
+    @Deprecated(since = "12")
     static void make72Safe(StringBuffer line) {
         int length = line.length();
         int index = 72;
@@ -179,6 +177,35 @@
         return;
     }
 
+    private static final String LINE_BREAK = "\r\n";
+    private static final String CONTINUATION_SPACE = " ";
+    private static final byte[] LINE_BREAK_BYTES = LINE_BREAK.getBytes(UTF_8);
+    private static final byte[] LINE_BREAK_WITH_CONTINUATION_SPACE_BYTES =
+            (LINE_BREAK + CONTINUATION_SPACE).getBytes(UTF_8);
+
+    /**
+     * Writes {@code line} to {@code out} with line breaks and continuation
+     * spaces within the limits of 72 bytes of contents on one line.
+     */
+    static void write72(OutputStream out, String line) throws IOException {
+        if (line.length() > 0) {
+            byte[] lineBytes = line.getBytes(UTF_8);
+            int length = lineBytes.length;
+            // first line can hold one byte more than subsequent lines which
+            // start with a continuation line break space
+            out.write(lineBytes[0]);
+            int index = 1;
+            while (length - index > 71) {
+                out.write(lineBytes, index, 71);
+                index += 71;
+                out.write(LINE_BREAK_WITH_CONTINUATION_SPACE_BYTES);
+            }
+            out.write(lineBytes, index, length - index);
+        }
+
+        out.write(LINE_BREAK_BYTES);
+    }
+
     /**
      * Reads the Manifest from the specified InputStream. The entry
      * names and attributes read will be merged in with the current
@@ -238,7 +265,7 @@
                     lastline = buf;
                     continue;
                 }
-                name = new String(buf, 0, buf.length, "UTF8");
+                name = new String(buf, 0, buf.length, UTF_8);
                 lastline = null;
             }
             Attributes attr = getAttributes(name);
@@ -264,7 +291,7 @@
             toLower(lbuf[2]) == 'm' && toLower(lbuf[3]) == 'e' &&
             lbuf[4] == ':' && lbuf[5] == ' ') {
             try {
-                return new String(lbuf, 6, len - 6, "UTF8");
+                return new String(lbuf, 6, len - 6, UTF_8);
             }
             catch (Exception e) {
             }

Reply via email to