Re: 6202130: Need to handle UTF-8 values and break up lines longer than 72 bytes

Philipp Kunz Mon, 13 Apr 2020 10:32:50 -0700

Hi Naoto,
You are absolutely right to raise the question. I've also thought about
this but haven't come up so far with a compellingly elegant solution,
at least not yet.
If only String.isLatin1 was public that would come in very handy.
Character or anything else I looked at cannot tell if a string is
ascii. BreakIterator supports iterating backwards so we could start at
the potential line end but that requires a start position that is a
boundary to start with and that is not obviously possible due to the
regional indicators and probably other code points requiring stateful
processing. Same with regexes and I wouldn't know how to express groups
that could count bytes. It does not even seem to be possible to guess
any number of characters to start searching for a boundary because of
the statefulness. Even the most simple approach to detect latin1
Strings requires an encoding or a regex such as "[^\\p{ASCII}]" which
essentially is another inefficient loop. It also does not work to
encode the string into UTF-8 in a single pass because then it is not
known which grapheme boundary matches to which byte position. I also
tried to write the header names and the ": " delimiter without breaking
first but it did not seem to significantly affect performance.
UTF_8.newEncoder cannot process single surrogates, admittedly an edge
case, but required for compatibility. I added a fast path, see patch,
the best way I could think of. Did I miss a better way to tell ascii
strings from others?
What I found actually improving performance is based on the
consideration that strings are composed of primitive chars that will be
encoded into three bytes each maximum always that up to 24 characters
can be passed to writeChar72 at a time reducing the number of
writeChar72 and in turn String.getBytes invocations. The number of
characters that can be passed to writeChar72 is at most the number of
bytes remaining on the current manifest line (linePos) divided by three
but at least one. See attached patch.
Regards,Philipp


On Mon, 2020-03-30 at 14:31 -0700, naoto.s...@oracle.com wrote:
> Hi Philipp,
> Sorry for the delay.
> On 3/25/20 11:52 AM, Philipp Kunz wrote:
> > Hi Naoto,
> > See another patch attached with Locale.ROOT for the BreakIterator.
> > I will be glad to hear of any feedback.
> 
> I wonder how your change affects the performance, as it will do
> String.getBytes(UTF-8) per each character. I think this can
> definitely be improved by adding some fastpath, e.g., for ASCII. The
> usage of the BreakIterator is fine, though.
> > There is another patch [1] around dealing with manifest attributes
> > during application launch. It certainly is related to 6202130 but
> > feels like a distinct set of changes with a slightly different
> > concern. Any opinion as how to proceed with that one?
> 
> I am not quite sure which patch you are referring to, but I agree
> that creating an issue would not hurt.
> Naoto
> > Regards,Philipp
> > 
> > 
> > [1] 
> > https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-February/064720.html
> > 
> > 
> > On Mon, 2020-03-23 at 09:05 -0700, naoto.s...@oracle.com wrote:
> > > Hi Philipp,
> > > Right, Locale.ROOT is more appropriate here by definition, though
> > > theimplementation is the same.
> > > Naoto
> > > On 3/21/20 5:19 AM, Philipp Kunz wrote:
> > > > Hi Naoto and everyone,
> > > > There are almost as many occurrences of Locale.ROOT as
> > > > Locale.US whichmade me wonder which one is more appropriately
> > > > locale-independent andwhich is probably another topic and not
> > > > actually relevant here
> > > > becauseBreakIterator.getCharacterInstance is locale-agnostic as
> > > > far as I can tell.
> > > > See attached patch with another attempt to fix bug 6202130.
> > > > Regards,Philipp
> > > > 
> > > > On Tue, 2020-03-10 at 10:45 -0700,naoto.s...@oracle.com
> > > >  <mailto:naoto.s...@oracle.com>  wrote:
> > > > > Hi Philipp,
> > > > > ..., so using BreakIterator (withLocale.US) is more preferred
> > > > > solution to me.
> > > > > Naoto

diff -r 9d4956ec144e src/java.base/share/classes/java/util/jar/Attributes.java
--- a/src/java.base/share/classes/java/util/jar/Attributes.java	Fri Apr 10 16:21:12 2020 -0400
+++ b/src/java.base/share/classes/java/util/jar/Attributes.java	Mon Apr 13 19:15:14 2020 +0200
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 1997, 2019, Oracle and/or its affiliates. All rights reserved.
+ * Copyright (c) 1997, 2020, Oracle and/or its affiliates. All rights reserved.
  * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
  *
  * This code is free software; you can redistribute it and/or modify it
@@ -301,7 +301,6 @@
 
     /*
      * Writes the current attributes to the specified data output stream.
-     * XXX Need to handle UTF8 values and break up lines longer than 72 bytes
      */
     void write(DataOutputStream out) throws IOException {
         StringBuilder buffer = new StringBuilder(72);
@@ -319,8 +318,6 @@
      * Writes the current attributes to the specified data output stream,
      * make sure to write out the MANIFEST_VERSION or SIGNATURE_VERSION
      * attributes first.
-     *
-     * XXX Need to handle UTF8 values and break up lines longer than 72 bytes
      */
     void writeMain(DataOutputStream out) throws IOException {
         StringBuilder buffer = new StringBuilder(72);
diff -r 9d4956ec144e src/java.base/share/classes/java/util/jar/Manifest.java
--- a/src/java.base/share/classes/java/util/jar/Manifest.java	Fri Apr 10 16:21:12 2020 -0400
+++ b/src/java.base/share/classes/java/util/jar/Manifest.java	Mon Apr 13 19:15:14 2020 +0200
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 1997, 2019, Oracle and/or its affiliates. All rights reserved.
+ * Copyright (c) 1997, 2020, Oracle and/or its affiliates. All rights reserved.
  * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
  *
  * This code is free software; you can redistribute it and/or modify it
@@ -30,6 +30,8 @@
 import java.io.IOException;
 import java.io.InputStream;
 import java.io.OutputStream;
+import java.text.BreakIterator;
+import java.util.Locale;
 import java.util.HashMap;
 import java.util.Map;
 
@@ -230,30 +232,106 @@
     }
 
     /**
-     * Writes {@code line} to {@code out} with line breaks and continuation
-     * spaces within the limits of 72 bytes of contents per line followed
-     * by a line break.
+     * Writes {@code line} to {@code out} with at most 72 bytes per line.
+     * Sub-sequences encoding characters and grapheme clusters in UTF-8
+     * with less than 72 bytes are not interrupted with line breaks.
      */
     static void println72(OutputStream out, String line) throws IOException {
-        if (!line.isEmpty()) {
+        // fast-path for short headers that don't require any line breaks
+        if (line.length() <= 72) {
             byte[] lineBytes = line.getBytes(UTF_8.INSTANCE);
-            int length = lineBytes.length;
-            // first line can hold one byte more than subsequent lines which
-            // start with a continuation line break space
-            out.write(lineBytes[0]);
-            int pos = 1;
-            while (length - pos > 71) {
-                out.write(lineBytes, pos, 71);
-                pos += 71;
+            if (lineBytes.length <= 72) {
+                out.write(lineBytes);
                 println(out);
-                out.write(' ');
+                return;
             }
-            out.write(lineBytes, pos, length - pos);
+        }
+
+        int linePos = 0; // number of bytes already put out on current line
+        BreakIterator characterBoundaryIterator =
+                BreakIterator.getCharacterInstance(Locale.ROOT);
+        characterBoundaryIterator.setText(line);
+        int start = characterBoundaryIterator.first(), end = start;
+        int lineLength = line.length();
+        for (int next = characterBoundaryIterator.next();
+                end < lineLength; start = end) {
+            do {
+                end = next;
+                next = characterBoundaryIterator.next();
+            } while (next != BreakIterator.DONE &&
+                    // three bytes per char maximum
+                    72 - linePos >= 3 * (next - start));
+
+            linePos = writeChar72(out, linePos, line.substring(start, end));
         }
         println(out);
     }
 
     /**
+     * Writes {@code characterString} in one piece on the current line or on a
+     * new continuation line after a line break unless it exceeds 71 bytes in
+     * its UTF-8 encoded form. More than 71 bytes cannot fit in a manifest line
+     * and will be broken across line breaks, reverting to respecting code point
+     * boundaries.
+     */
+    private static int writeChar72(OutputStream out, int linePos,
+            String characterString) throws IOException {
+        // characterString contains one grapheme cluster consisting of a
+        // possibly unlimited number of primitive chars.
+        byte[] characterBytes = characterString.getBytes(UTF_8.INSTANCE);
+        int characterLength = characterBytes.length;
+        int characterPos = 0; // number of bytes of current character
+        // already put out
+
+        // Put out a break onto a new line if the character or rather grapheme
+        // cluster does not fit on the current line anymore but fits on a new
+        // line.
+        // In other words, only if the current grapheme cluster exceeds 71 bytes
+        // in its UTF-8 encoded form and therefore does not fit on one whole
+        // continuation line alone, skip this line break and fill the current
+        // line first before putting a break inside of the grapheme cluster.
+        if (linePos + characterLength > 72 && characterLength < 72) {
+            println(out);
+            out.write(' ');
+            linePos = 1;
+        }
+
+        // Break exceptionally large grapheme clusters that don't fit on one
+        // line at code point boundaries.
+        while (linePos + characterLength - characterPos > 72) {
+            int nextBreakPos = 72 - linePos;
+            while (isContinuationByte(
+                    characterBytes[characterPos + nextBreakPos])) {
+                nextBreakPos--;
+            }
+            out.write(characterBytes, characterPos, nextBreakPos);
+            characterPos += nextBreakPos;
+            println(out);
+            out.write(' ');
+            linePos = 1;
+        }
+
+        int remainder = characterLength - characterPos;
+        out.write(characterBytes, characterPos, remainder);
+        linePos += remainder;
+        return linePos;
+    }
+
+    /**
+     * Returns {@code true} if the passed byte as parameter {@code b}
+     * is not the first (or only) byte of a Unicode character encoded in UTF-8
+     * and {@code false} otherwise.
+     *
+     * @see <a href="https://tools.ietf.org/html/rfc3629#section-3";>
+     * RFC 3629 - UTF-8, a transformation format of ISO 10646</a>
+     * @see StringCoding#isNotContinuation(int)
+     * @see sun.nio.cs.UTF_8.Decoder#isNotContinuation(int)
+     */
+    private static boolean isContinuationByte(byte b) {
+        return (b & 0xc0) == 0x80;
+    }
+
+    /**
      * Writes a line break to {@code out}.
      */
     static void println(OutputStream out) throws IOException {
diff -r 9d4956ec144e test/jdk/java/util/jar/Manifest/LineBreakCharacter.java
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/test/jdk/java/util/jar/Manifest/LineBreakCharacter.java	Mon Apr 13 19:15:14 2020 +0200
@@ -0,0 +1,946 @@
+/*
+ * Copyright (c) 2020, Oracle and/or its affiliates. All rights reserved.
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This code is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 only, as
+ * published by the Free Software Foundation.
+ *
+ * This code is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * version 2 for more details (a copy is included in the LICENSE file that
+ * accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License version
+ * 2 along with this work; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
+ * or visit www.oracle.com if you need additional information or have any
+ * questions.
+ */
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.text.BreakIterator;
+import java.util.Locale;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.jar.Attributes;
+import java.util.jar.Manifest;
+import java.util.jar.Attributes.Name;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Random;
+import java.util.stream.Collectors;
+import java.util.stream.IntStream;
+import java.util.stream.Stream;
+
+import org.testng.annotations.*;
+import static org.testng.Assert.*;
+
+/**
+ * @test
+ * @bug 6443578 6202130
+ * @compile ../../../../sun/security/tools/jarsigner/Utils.java
+ * @run testng LineBreakCharacter
+ * @summary Tests {@link Manifest#writeChar72} breaking manifest header values
+ * across lines in conjunction with Unicode characters encoded in UTF-8 with a
+ * variable number of bytes when reading and writing jar manifests results in
+ * valid UTF-8.
+ * <p>
+ * This test verifies that no Unicode code point character or grapheme cluster
+ * up to 71 bytes of length encoded in UTF-8 is broken apart across a line break
+ * when writing manifests and also that manifests are still read correctly
+ * whether or not characters encoded in UTF-8 with more than one byte are
+ * interrupted with and continued after a line break for compatibility when
+ * reading older manifests.
+ * <p>
+ * For another test covering the complete Unicode character set see
+ * {@link ValueUtf8Coding}.
+ */
+public class LineBreakCharacter {
+
+    static final int MANIFEST_LINE_CONTENT_WIDTH_BYTES = 72;
+    static final int MANIFEST_CONTINUATION_LINE_CONTENT_WIDTH_BYTES =
+            MANIFEST_LINE_CONTENT_WIDTH_BYTES - 1;
+
+    /**
+     * Character string that has one byte size in its UTF-8 encoded form to
+     * yield one byte of position offset.
+     */
+    static final String FILL1BYTE = "x";
+    static final String MARK_BEFORE = "y";
+    static final String MARK_AFTER = "z";
+
+    /**
+     * Four byte name.
+     * By using header names of four characters length the same values can be
+     * used for testing line breaks in both headers (in main attributes as well
+     * as named sections) as well as section names because a named section name
+     * is represented basically like any other header but follows an empty line
+     * and the key is always "Name".
+     * Relative to the start of the value, this way the same offset to the
+     * character to test breaking can be used in all cases.
+     */
+    static final String FOUR_BYTE_NAME = "Name";
+    static final int NAME_SEP_LENGTH = (FOUR_BYTE_NAME + ": ").length();
+
+    /**
+     * Distinguishes main attributes headers, section names, and headers in
+     * named sections because an implementation might make a difference.
+     */
+    enum PositionInManifest {
+        /**
+         * @see Attributes#writeMain
+         */
+        MAIN_ATTRIBUTES,
+        /**
+         * @see Attributes#write
+         */
+        SECTION_NAME,
+        /**
+         * @see Manifest#write
+         */
+        NAMED_SECTION;
+    }
+
+    static String numByteUnicodeCharacter(int numBytes) {
+        String string;
+        switch (numBytes) {
+        case 1: string = "i"; break;
+        case 2: string = "\u00EF"; break; // small letter i with diaresis
+        case 3: string = "\uFB00"; break; // small double f ligature
+        case 4: string = Character.toString(0x2070E); break; // surrogate pair
+        default: throw new AssertionError();
+        }
+        assertEquals(string.getBytes(UTF_8).length, numBytes,
+                "self-test failed: unexpected UTF-8 encoded character length");
+        return string;
+    }
+
+    /**
+     * Produces test cases with all combinations of circumstances covered in
+     * which a character could possibly be attempted to be broken across a line
+     * break onto a continuation line:<ul>
+     * <li>different sizes of a UTF-8 encoded characters: one, two, three, and
+     * four bytes,</li>
+     * <li>all possible positions of the character to test breaking with
+     * relative respect to the 72-byte line length limit including immediately
+     * before that character and immediately after the character and every
+     * position in between for multi-byte UTF-8 encoded characters,</li>
+     * <li>different number of preceding line breaks in the same value</li>
+     * <li>at the end of the value or followed by another character</li>
+     * <li>in a main attributes header value, section name, or named section
+     * header value (see also {@link #PositionInManifest})</li>
+     * </ul>
+     * The same set of test parameters is used to write and read manifests
+     * once without breaking characters apart
+     * ({@link #testWriteLineBreaksKeepCharactersTogether(int, int, int, int,
+     * PositionInManifest, String, String)}) and once with doing so
+     * ({@link #readCharactersBrokenAcrossLines(int, int, int, int,
+     * PositionInManifest, String, String)}) the latter case of which covering
+     * backwards compatibility and involves writing manifests like they were
+     * written before resolution of bugs 6202130 and 6443578.
+     */
+    @DataProvider(name = "charBreakParameters")
+    public static Object[][] charBreakParameters() {
+        return Stream.of(new Object[] { null }
+        ).flatMap(o ->
+            // b: number of line breaks before character under test
+            IntStream.rangeClosed(0, 3).mapToObj(
+                b -> new Object[] { b })
+        ).flatMap(o ->
+            // c: unicode character UTF-8 encoded length in bytes
+            IntStream.rangeClosed(1, 4).mapToObj(
+                c -> new Object[] { o[0], c })
+        ).flatMap(o ->
+            // p: potential break position offset in bytes
+            // p == 0 => before character,
+            // p == c => after character, and
+            // 0 < p < c => position within the character's byte sequence
+            IntStream.rangeClosed(0, (int) o[1]).mapToObj(
+                p -> new Object[] { o[0], o[1], p })
+        ).flatMap(o ->
+            // a: no or one character following the one under test
+            // (a == 0 meaning the character under test is the end of
+            // the value which is followed by a line break in the
+            // resulting manifest without continuation line space which
+            // concludes the value)
+            IntStream.rangeClosed(0, 1).mapToObj(
+                a -> new Object[] { o[0], o[1], o[2], a })
+        ).flatMap(o ->
+            Arrays.stream(PositionInManifest.values()).map(
+                i -> new Object[] { o[0], o[1], o[2], o[3], i })
+        ).map(o -> {
+            int b = (int) o[0];
+            int c = (int) o[1];
+            int p = (int) o[2];
+            int a = (int) o[3];
+            PositionInManifest i = (PositionInManifest) o[4];
+
+            // offset: so many characters (actually bytes here, because filled
+            // with one byte characters) are needed to place the next character
+            // (the character under test) into a position relative to the
+            // maximum line width that it may or may not have to be broken onto
+            // the next line
+            int offset =
+                    // number of lines; - 1 due to continuation " "
+                    b * (MANIFEST_LINE_CONTENT_WIDTH_BYTES - 1)
+                    // first line available content space (after "Name: ")
+                    + MANIFEST_LINE_CONTENT_WIDTH_BYTES - NAME_SEP_LENGTH
+                    // position of line width limit relative to tested character
+                    - p;
+
+            String value = "";
+            value += FILL1BYTE.repeat(offset - 1);
+            // character before the one to test the break
+            value += MARK_BEFORE;
+            String character = numByteUnicodeCharacter(c);
+            value += character;
+            // (optional) character after the one to test the break
+            value += MARK_AFTER.repeat(a);
+
+            return new Object[] { b, c, p, a, i, character, value };
+        }).toArray(size -> new Object[size][]);
+    }
+
+    /**
+     * Checks that unicode characters work well with line breaks and
+     * continuation lines in jar manifests without breaking a character across
+     * a line break even when encoded in UTF-8 with more than one byte.
+     * <p>
+     * For each of the cases provided by {@link #charBreakParameters()} the
+     * break position is verified in the written manifest binary form as well
+     * as verified that it restores to the original values when read again.
+     * <p>
+     * As an additional check, the binary manifests are decoded from UTF-8
+     * into strings before re-joining continued lines.
+     */
+    @Test(dataProvider = "charBreakParameters")
+    public void testWriteLineBreaksKeepCharactersTogether(int b, int c, int p,
+            int a, PositionInManifest i, String character, String value)
+                    throws Exception {
+        // in order to unambiguously establish the position of "character" in
+        // brokenPart, brokenPart is prepended and appended with what is
+        // expected before and after it...
+        String brokenPart = MARK_BEFORE;
+
+        // expect the whole character on the next line unless it fits
+        // completely on the current line
+        boolean breakExpectedBeforeCharacter = p < c;
+        if (breakExpectedBeforeCharacter) {
+            brokenPart += "\r\n ";
+        }
+        brokenPart += character;
+        // expect a line break before the next character if there is a next
+        // character and the previous not already broken on next line
+        if (a > 0) {
+            if (!breakExpectedBeforeCharacter) {
+                brokenPart += "\r\n ";
+            }
+            brokenPart += MARK_AFTER;
+        }
+        brokenPart = brokenPart + "\r\n";
+
+        byte[] mfBytes = writeManifest(i, FOUR_BYTE_NAME, value);
+        try {
+            assertOccurrence(mfBytes, brokenPart.getBytes(UTF_8));
+            readManifestAndAssertValue(mfBytes, i, FOUR_BYTE_NAME, value);
+            decodeManifestAsUtf8AndAssertValue(
+                    mfBytes, FOUR_BYTE_NAME, value, true);
+        } catch (AssertionError e) {
+            Utils.echoManifest(mfBytes, "faulty manifest: " + e);
+            throw e;
+        }
+    }
+
+    byte[] writeManifest(PositionInManifest i, String name,
+            String value) throws IOException {
+        Manifest mf = new Manifest();
+        mf.getMainAttributes().put(Name.MANIFEST_VERSION, "1.0");
+
+        switch (i) {
+        case MAIN_ATTRIBUTES:
+            mf.getMainAttributes().put(new Name(name), value);
+            break;
+        case SECTION_NAME:
+            mf.getEntries().put(value, new Attributes());
+            break;
+        case NAMED_SECTION:
+            Attributes attributes = new Attributes();
+            mf.getEntries().put(FOUR_BYTE_NAME, attributes);
+            attributes.put(new Name(name), value);
+            break;
+        }
+
+        ByteArrayOutputStream out = new ByteArrayOutputStream();
+        mf.write(out);
+        return out.toByteArray();
+    }
+
+    /**
+     * Asserts one and only one occurrence of a sequence of bytes {@code part}
+     * representing the character and how it is expected to be broken and its
+     * surrounding bytes in a larger sequence that corresponds to the manifest
+     * in binary form {@code mf}.
+     */
+    void assertOccurrence(byte[] mf, byte[] part) {
+        List<Integer> matchPos = new LinkedList<>();
+        for (int i = 0; i < mf.length; i++) {
+            for (int j = 0; j < part.length && i + j <= mf.length; j++) {
+                if (part[j] == 0) {
+                    if (i + j != mf.length) {
+                        break; // expected eof not found
+                    }
+                } else if (i + j == mf.length) {
+                    break;
+                } else if (mf[i + j] != part[j]) {
+                    break;
+                }
+                if (j == part.length - 1) {
+                    matchPos.add(i);
+                }
+            }
+        }
+        assertEquals(matchPos.size(), 1, "not "
+                + (matchPos.size() < 1 ? "found" : "unique") + ": '"
+                + new String(part, UTF_8) + "'");
+    }
+
+    void readManifestAndAssertValue(
+            byte[] mfBytes, PositionInManifest i, String name, String value)
+                    throws IOException {
+        Manifest mf = new Manifest(new ByteArrayInputStream(mfBytes));
+
+        switch (i) {
+        case MAIN_ATTRIBUTES:
+            assertEquals(mf.getMainAttributes().getValue(name), value,
+                    "main attributes header value");
+            break;
+        case SECTION_NAME:
+            Attributes attributes = mf.getAttributes(value);
+            assertNotNull(attributes, "named section not found");
+            break;
+        case NAMED_SECTION:
+            attributes = mf.getAttributes(FOUR_BYTE_NAME);
+            assertEquals(attributes.getValue(name), value,
+                    "named section attributes header value");
+            break;
+        }
+    }
+
+    /**
+     * Decodes a binary manifest {@code mfBytes} into UTF-8 first, before
+     * joining the continuation lines unlike {@link Manifest} and
+     * {@link Attributes} which join the continuation lines first, before
+     * decoding the joined line from UTF-8 into a {@link String}, evaluating
+     * whether or not the binary manifest is valid UTF-8.
+     */
+    void decodeManifestAsUtf8AndAssertValue(
+            byte[] mfBytes, String name, String value,
+            boolean validUtf8ManifestExpected) throws IOException {
+        String mf = new String(mfBytes, UTF_8)
+                .replaceAll("(\\r\\n|(?!\\r)\\n|\\r(?!\\n)) ", "");
+        String header = "\r\n" + name + ": " + value + "\r\n";
+        int pos = mf.indexOf(header);
+        if (validUtf8ManifestExpected) {
+            assertTrue(pos > 0);
+            pos = mf.indexOf(header, pos + 1);
+        }
+        // assert no ocurrence or no other occurrence after one match above
+        assertTrue(pos == -1);
+    }
+
+    @Test(dataProvider = "charBreakParameters")
+    public void readCharactersBrokenAcrossLines(int b, int c, int p, int a,
+            PositionInManifest i, String character, String value)
+                    throws Exception {
+        ByteArrayOutputStream buf = new ByteArrayOutputStream();
+        buf.write(MARK_BEFORE.getBytes(UTF_8));
+        byte[] characterBytes = character.getBytes(UTF_8);
+        // the portion of the character that fits on the current line before
+        // a break at 72 bytes, ranges from nothing (p == 0) to the whole
+        // character (p == c)
+        for (int j = 0; j < p; j++) {
+            buf.write(characterBytes, j, 1);
+        }
+        // expect a line break at exactly 72 bytes from the beginning of the
+        // line unless the whole character fits on that line
+        boolean breakExpectedBeforeCharacter = p < c;
+        if (breakExpectedBeforeCharacter) {
+            buf.write("\r\n ".getBytes(UTF_8));
+        }
+        // the remaining portion of the character, if any
+        for (int j = p; j < c; j++) {
+            buf.write(characterBytes, j, 1);
+        }
+        // expect another line break if the whole character fitted on the same
+        // line and there is another character
+        if (a == 1) {
+            if (c == p) {
+                buf.write("\r\n ".getBytes(UTF_8));
+            }
+            buf.write(MARK_AFTER.getBytes(UTF_8));
+        }
+        // expect a line break immediately after the value
+        buf.write("\r\n".getBytes(UTF_8));
+        byte[] brokenPart = buf.toByteArray();
+
+        byte[] mfBytes = writeManifestWithBrokenCharacters(
+                i, FOUR_BYTE_NAME, value);
+        try {
+            assertOccurrence(mfBytes, brokenPart);
+            readManifestAndAssertValue(mfBytes, i, FOUR_BYTE_NAME, value);
+            decodeManifestAsUtf8AndAssertValue(
+                    mfBytes, FOUR_BYTE_NAME, value, p == 0 || p == c);
+        } catch (AssertionError e) {
+            Utils.echoManifest(mfBytes, "faulty manifest: " + e);
+            throw e;
+        }
+    }
+
+    /**
+     * From the previous {@link Manifest} implementation reduced to the minimum
+     * required to demonstrate compatibility.
+     */
+    @SuppressWarnings("deprecation")
+    static byte[] writeManifestWithBrokenCharacters(
+            PositionInManifest i, String name, String value)
+                    throws IOException {
+        byte[] vb = value.getBytes(UTF_8);
+        value = new String(vb, 0, 0, vb.length);
+        ByteArrayOutputStream out = new ByteArrayOutputStream();
+        DataOutputStream dos = new DataOutputStream(out);
+        dos.writeBytes(Name.MANIFEST_VERSION + ": 0.1\r\n");
+
+        if (i == PositionInManifest.MAIN_ATTRIBUTES) {
+            StringBuffer buffer = new StringBuffer(name);
+            buffer.append(": ");
+            buffer.append(value);
+            make72Safe(buffer);
+            buffer.append("\r\n");
+            dos.writeBytes(buffer.toString());
+        }
+        dos.writeBytes("\r\n");
+
+        if (i == PositionInManifest.SECTION_NAME ||
+                i == PositionInManifest.NAMED_SECTION) {
+            StringBuffer buffer = new StringBuffer("Name: ");
+            if (i == PositionInManifest.SECTION_NAME) {
+                buffer.append(value);
+            } else {
+                buffer.append(FOUR_BYTE_NAME);
+            }
+            make72Safe(buffer);
+            buffer.append("\r\n");
+            dos.writeBytes(buffer.toString());
+
+            if (i == PositionInManifest.NAMED_SECTION) {
+                buffer = new StringBuffer(name);
+                buffer.append(": ");
+                buffer.append(value);
+                make72Safe(buffer);
+                buffer.append("\r\n");
+                dos.writeBytes(buffer.toString());
+            }
+
+            dos.writeBytes("\r\n");
+        }
+
+        dos.flush();
+        return out.toByteArray();
+    }
+
+    /**
+     * Adds line breaks to enforce a maximum 72 bytes per line.
+     * <p>
+     * From previous Manifest implementation without respect for UTF-8 encoded
+     * character boundaries breaking also within multi-byte UTF-8 encoded
+     * characters.
+     *
+     * @see {@link Manifest#make72Safe}
+     */
+    static void make72Safe(StringBuffer line) {
+        int length = line.length();
+        int index = 72;
+        while (index < length) {
+            line.insert(index, "\r\n ");
+            index += 74; // + line width + line break ("\r\n")
+            length += 3; // + line break ("\r\n") and space
+        }
+    }
+
+    @Test
+    public void testEmptyValue() throws Exception {
+        assertBreakPositions("");
+        for (PositionInManifest i : PositionInManifest.values()) {
+            byte[] mfBytes = writeManifest(i, FOUR_BYTE_NAME, "");
+            readManifestAndAssertValue(mfBytes, i, FOUR_BYTE_NAME, "");
+        }
+    }
+
+    void assertBreakPositions(String originalValue, int... breakPositionsBytes)
+            throws IOException {
+        String expectedValueWithBreaksInManifest = originalValue;
+        // iterating backwards because inserting breaks affects positions
+        for (int i = breakPositionsBytes.length - 1; i >= 0; i--) {
+            int breakPositionBytes = breakPositionsBytes[i];
+
+            // Translate breakPositionBytes byte offset into
+            // breakPositionCharacters (primitive char type) character offset
+            // for cutting the string with String.substring lateron.
+            // Higher code points may be represented with two UTF-16 surrogate
+            // pair characters which both count for String.substring.
+            int bytesSoFar = 0;
+            int charsSoFar = 0;
+            while (bytesSoFar < breakPositionBytes) {
+                bytesSoFar += expectedValueWithBreaksInManifest
+                        .substring(charsSoFar, charsSoFar + 1)
+                        .getBytes(UTF_8).length;
+                charsSoFar++;
+                assertTrue(bytesSoFar <= breakPositionBytes,
+                        "break position not aligned with characters");
+            }
+            int breakPositionCharacters = charsSoFar;
+
+            expectedValueWithBreaksInManifest =
+                    expectedValueWithBreaksInManifest
+                            .substring(0, breakPositionCharacters)
+                    + "\r\n " +
+                    expectedValueWithBreaksInManifest
+                            .substring(breakPositionCharacters);
+        }
+
+        PositionInManifest i = PositionInManifest.MAIN_ATTRIBUTES;
+        byte[] actual = writeManifest(i , FOUR_BYTE_NAME, originalValue);
+        readManifestAndAssertValue(actual, i, FOUR_BYTE_NAME, originalValue);
+        String expected =
+                "Manifest-Version: 1.0\r\n" +
+                FOUR_BYTE_NAME + ": " + expectedValueWithBreaksInManifest +
+                "\r\n\r\n";
+        try {
+            assertEquals(new String(actual, UTF_8), expected);
+            assertEquals(actual, expected.getBytes(UTF_8));
+        } catch (AssertionError e) {
+            Utils.echoManifest(actual, "faulty manifest: " + e);
+            System.out.println("actual bytes   = " +
+                    byteArrayToIntList(actual));
+            System.out.println("expected bytes = " +
+                    byteArrayToIntList(expected.getBytes(UTF_8)));
+            System.out.println("actual chars   = " +
+                    new String(actual, UTF_8).chars().mapToObj(j -> j).collect(Collectors.toList()));
+            System.out.println("expected chars = " +
+                    expected.chars().mapToObj(j -> j).collect(Collectors.toList()));
+            System.out.println("actual code points   = " +
+                    new String(actual, UTF_8).codePoints().mapToObj(j -> j).collect(Collectors.toList()));
+            System.out.println("expected code points = " +
+                    expected.codePoints().mapToObj(j -> j).collect(Collectors.toList()));
+            throw e;
+        }
+    }
+
+    static List<Integer> byteArrayToIntList(byte[] bytes) {
+        List<Integer> list = new ArrayList<>();
+        for (int i = 0; i < bytes.length; i++) {
+            list.add((int) bytes[i]);
+        }
+        return list;
+    }
+
+    @Test
+    public void testBreakAlignmentSelfTest() throws Exception {
+        assertThrows(AssertionError.class, () ->
+            assertBreakPositions(getCharSeq(2), 1));
+        assertThrows(AssertionError.class, () ->
+            assertBreakPositions("\uD841\uDF0E", 2));
+    }
+
+    static IntStream concat(IntStream... in) {
+        IntStream r = in[0];
+        for (int i = 1; i < in.length; i++) {
+            r = IntStream.concat(r, in[i]);
+        }
+        return r;
+    }
+
+    static final String[] COMBINING_MARKS_TWO = // two bytes each
+        IntStream.range(0x300, 0x36F) // Combining Diacritical Marks
+        .mapToObj(Character::toString).toArray(size -> new String[size]);
+
+    static final String COMBINING_MARKS_TWO_JOINED =
+        Arrays.stream(COMBINING_MARKS_TWO).collect(Collectors.joining());
+
+    static String getCharSeq2(int numberOfBytes) {
+        String seq = (numberOfBytes % 2 == 1 ? "e" : "\u00E6") +
+              COMBINING_MARKS_TWO_JOINED.substring(0, (numberOfBytes - 1) / 2);
+        assertEquals(seq.getBytes(UTF_8).length, numberOfBytes);
+        assertSingleGraphemeBreak(seq);
+        return seq;
+    }
+
+    static final Random R = new Random(0);
+
+    static final String[] COMBINING_MARKS_THREE = concat( // three bytes each
+        IntStream.range(0x1AB0, 0x1AFF) // Comb. Diacritical Marks Extended
+            .filter(cp -> !(0x1AC1 <= cp && cp <= 0x1AFF)) // reserved
+            .filter(cp -> cp != 0x1AC0 && cp != 0x1ABF), // Unicode >= 13.0
+        IntStream.range(0x1DC0, 0x1DFF) // Comb. Diacr. Marks Supplement
+            .filter(cp -> cp != 0x1DFA), // reserved
+        IntStream.range(0x20D0, 0x20FF) // Comb. Diacr. Marks for Symbols
+            .filter(cp -> !(0x20F1 <= cp && cp <= 0x20FF)), // reserved
+        IntStream.range(0xFE20, 0xFE2F)) // Combining Half Marks
+        .mapToObj(Character::toString).toArray(size -> new String[size]);
+
+    static String getCharSeq(int numberOfBytes) {
+        String seq;
+        int n = numberOfBytes % 5;
+        switch (n) {
+        case 0:
+            seq = numByteUnicodeCharacter(3) + "\u0302";
+            n = 5;
+            break;
+        case 4:
+            seq = numByteUnicodeCharacter(2) + "\u0303";
+            break;
+        default:
+            seq = numByteUnicodeCharacter(n);
+        }
+        while (n < numberOfBytes) {
+            int r2 = R.nextInt(COMBINING_MARKS_TWO.length);
+            int r3 = R.nextInt(COMBINING_MARKS_THREE.length);
+            if (n % 2 == 1) {
+                seq += COMBINING_MARKS_TWO[r2] + COMBINING_MARKS_THREE[r3];
+            } else {
+                seq += COMBINING_MARKS_THREE[r3] + COMBINING_MARKS_TWO[r2];
+            }
+            n += 5;
+        }
+        assertEquals(seq.getBytes(UTF_8).length, numberOfBytes);
+        assertSingleGraphemeBreak(seq);
+        return seq;
+    }
+
+    static void assertSingleGraphemeBreak(String string) {
+        String msg = "" + string.codePoints()
+                .mapToObj(Integer::toHexString).collect(Collectors.toList());
+        BreakIterator characterBoundaryIterator =
+                BreakIterator.getCharacterInstance(Locale.ROOT);
+        characterBoundaryIterator.setText(string);
+        assertEquals(characterBoundaryIterator.first(), 0, msg);
+        assertEquals(characterBoundaryIterator.next(), string.length(), msg);
+        assertEquals(characterBoundaryIterator.next(), BreakIterator.DONE, msg);
+    }
+
+    @Test
+    public void testBreakOnFirstLine() throws Exception {
+        // Combining sequence starts immediately after name and ": " and fits
+        // the remaining space in the first line. Expect no break.
+        assertBreakPositions(getCharSeq2(66));
+
+        // Combining sequence starts after name and ": " and exceeds the
+        // remaining space in the first line by one byte. Expect to break on a
+        // new line because the combining sequence fits on a continuation line
+        // which does not start with name and ": " and provides enough space.
+        assertBreakPositions(getCharSeq2(67), 0);
+
+        // Combining sequence starts after name and ": " and exceeds the
+        // remaining space in the first line but still just fits exactly on a
+        // continuation line. Expect the value to break onto a new line.
+        assertBreakPositions(getCharSeq2(71), 0);
+
+        // Combining sequence starts after name and ": " and exceeds the
+        // remaining space in the first line and neither fits on a continuation
+        // line. Expect that the first line to be filled with as many codepoints
+        // as fit on it and expect a line break onto a continuation line after
+        // 66 bytes of the first line value.
+        assertBreakPositions(getCharSeq2(72), 72 - NAME_SEP_LENGTH);
+
+        // Combining sequence starts after name and ": x" and exceeds the
+        // remaining space in the first line and neither fits on a continuation
+        // line. Expect that the first line to be filled with as many codepoints
+        // as fit on it and expect a line break onto a continuation line already
+        // after 65 bytes of the first line because the following character is
+        // a code point represented with two bytes in UTF-8 which should not
+        // be interrupted with a line break.
+        assertBreakPositions("x" + getCharSeq2(72), 72 - NAME_SEP_LENGTH - 1);
+    }
+
+    @Test
+    public void testBreakOnContinuationLine() throws Exception {
+        final int FIRST_LINE_FREE_SPACE = // 66 bytes
+                MANIFEST_LINE_CONTENT_WIDTH_BYTES - NAME_SEP_LENGTH;
+        final String FILL_FIRST_LINE = "x".repeat(FIRST_LINE_FREE_SPACE);
+
+        // fits on next line by skipping one byte free on current line
+        assertBreakPositions(
+                FILL_FIRST_LINE + "x".repeat(71 - 1) + getCharSeq2(71),
+                66,
+                66 + 71 - 1);
+
+        // fits on current line exactly
+        assertBreakPositions(
+                FILL_FIRST_LINE + "x".repeat(71) + getCharSeq2(71),
+                66,
+                66 + 71);
+
+        // fits on next line by inserting a line break after a line that
+        // contains only one character yet
+        assertBreakPositions(
+                FILL_FIRST_LINE + "x".repeat(71 + 1) + getCharSeq2(71),
+                66,
+                66 + 71,
+                66 + 71 + 1);
+
+        // does not fit on the next line and the one byte not yet used on the
+        // current line does not hold the first code point of the combined
+        // character sequence which is a code point encoded with two bytes in
+        // UTF-8.
+        assertTrue(getCharSeq2(72).substring(0, 1).getBytes(UTF_8).length == 2);
+        assertBreakPositions(
+                FILL_FIRST_LINE + "x".repeat(71 - 1) + getCharSeq2(72),
+                66,
+                66 + 71 - 1,
+                66 + 71 - 1 + 71 - 1);
+
+        // would not fit on the next line alone but fits on the remaining two
+        // bytes available on the current line and the whole subsequent line.
+        assertBreakPositions(
+                FILL_FIRST_LINE + "x".repeat(71 - 2) + getCharSeq2(72),
+                66,
+                66 + 71);
+
+        // previous character filled the whole previous line completely
+        // and the combined character sequence with 72 bytes still does not fit
+        // on a single line. the last code point is a two byte one so that an
+        // unused byte is left unused on the second last line.
+        assertTrue(getCharSeq2(72).length() == 36); // primitive chars
+        assertTrue(getCharSeq2(72).substring(35).getBytes(UTF_8).length == 2);
+        assertBreakPositions(
+                FILL_FIRST_LINE + "x".repeat(71) + getCharSeq2(72),
+                66,
+                66 + 71,
+                66 + 71 + 71 - 1);
+
+        // previous character left one byte used on the current line and 70
+        // bytes remaining available. the combining sequence can use all of
+        // these 70 bytes because after 70 bytes a new code point starts
+        assertBreakPositions(
+                FILL_FIRST_LINE + "x".repeat(71 + 1) + getCharSeq2(72),
+                66,
+                66 + 71,
+                66 + 71 + 71);
+    }
+
+    @DataProvider(name = "graphemeBreakParameters")
+    public static Object[][] graphemeBreakParameters() {
+        boolean quick = true; // otherwise the test is too slow
+        return Stream.of(new Object[] { null }
+        ).flatMap(o ->
+            // l: number of preceding lines
+            IntStream.of(0, 3, 5, 8).mapToObj(
+                l -> new Object[] { l })
+        ).flatMap(o ->
+            // c: number of characters on last line before grapheme
+            (((int) o[0]) == 0 ?
+                IntStream.rangeClosed(NAME_SEP_LENGTH,
+                        MANIFEST_LINE_CONTENT_WIDTH_BYTES)
+            :
+                IntStream.rangeClosed(1,
+                        MANIFEST_CONTINUATION_LINE_CONTENT_WIDTH_BYTES)
+            ).filter(c -> !quick || c % (4 + (int) o[0]) == 0
+            ).mapToObj(
+                c -> new Object[] { o[0], c })
+        ).flatMap(o ->
+            // s: grapheme cluster size in number of bytes
+            IntStream.rangeClosed(1, 214)
+            .filter(s -> !quick || s % (5 + (int) o[0] + (int) o[1]) == 0)
+            .mapToObj(
+                s -> new Object[] { o[0], o[1], s })
+        ).toArray(size -> new Object[size][]);
+    }
+
+    @Test(dataProvider = "graphemeBreakParameters")
+    public void testGraphemeBreak(int l, int c, int s) throws Exception {
+        String g = getCharSeq(s);
+        List<Integer> expectedBreakPositions = new ArrayList<>();
+        int gStartPos = c - NAME_SEP_LENGTH;
+        if (l > 0) {
+            gStartPos += MANIFEST_LINE_CONTENT_WIDTH_BYTES - 1
+                    - MANIFEST_CONTINUATION_LINE_CONTENT_WIDTH_BYTES;
+        }
+        for (int i = 0; i < l; i++) {
+            gStartPos += MANIFEST_CONTINUATION_LINE_CONTENT_WIDTH_BYTES;
+            expectedBreakPositions.add(
+                MANIFEST_LINE_CONTENT_WIDTH_BYTES - NAME_SEP_LENGTH
+                + i * MANIFEST_CONTINUATION_LINE_CONTENT_WIDTH_BYTES);
+        }
+        if (MANIFEST_LINE_CONTENT_WIDTH_BYTES - c < s
+                && s <= MANIFEST_CONTINUATION_LINE_CONTENT_WIDTH_BYTES) {
+            expectedBreakPositions.add(gStartPos);
+            c = 1; // one byte for continuation space
+        }
+
+        int linePos = c, characterPos = 0;
+        byte[] characterBytes = g.getBytes(UTF_8);
+        int characterLength = characterBytes.length;
+        while (linePos + characterLength - characterPos
+                > MANIFEST_LINE_CONTENT_WIDTH_BYTES) {
+            int nextBreakPos =
+                    MANIFEST_LINE_CONTENT_WIDTH_BYTES - linePos;
+            while (isContinuationByte(
+                    characterBytes[characterPos + nextBreakPos])) {
+                nextBreakPos--;
+            }
+            characterPos += nextBreakPos;
+            expectedBreakPositions.add(gStartPos + characterPos);
+            linePos = 1;
+        }
+
+        String value = "x".repeat(gStartPos) + g;
+        assertBreakPositions(value,
+                expectedBreakPositions.stream().mapToInt(i -> i).toArray());
+    }
+
+    /**
+     * @see Manifest#isContinuationByte
+     */
+    static boolean isContinuationByte(byte b) {
+        return (b & 0xc0) == 0x80;
+    }
+
+    byte[] writeAndReadManifestAndReadValue(
+            String writeValue, String readValue) throws IOException {
+        PositionInManifest i = PositionInManifest.MAIN_ATTRIBUTES;
+        byte[] mfBytes = writeManifest(i, FOUR_BYTE_NAME, writeValue);
+        try {
+            readManifestAndAssertValue(mfBytes, i, FOUR_BYTE_NAME, readValue);
+            decodeManifestAsUtf8AndAssertValue(
+                    mfBytes, FOUR_BYTE_NAME, readValue, true);
+        } catch (AssertionError e) {
+            Utils.echoManifest(mfBytes, "faulty manifest: " + e);
+            throw e;
+        }
+        return mfBytes;
+    }
+
+    @DataProvider(name = "surrogatesParameters")
+    public static Object[][] surrogatesParameters() {
+        List<Integer> surrogateCps =
+                IntStream.rangeClosed(0xD800, 0xDFFF)
+                .filter(cp -> // too slow otherwise
+                    cp == 0xD800 || cp == 0xDFFF || cp % 100 == 42)
+        .mapToObj(i -> i).collect(Collectors.toList());
+        return Stream.concat(
+            surrogateCps.stream() // single surrogates
+            .map(cp -> Character.toString((char) (int) cp))
+            .map(value -> new Object[] { value, "?" })
+        ,
+            surrogateCps.stream() // combinations of two
+            .flatMap(c1 -> surrogateCps.stream().map(c2 -> {
+                String write = Character.toString(c1)
+                        + Character.toString(c2);
+                boolean validPair =
+                        Character.isHighSurrogate((char) (int) c1)
+                        && Character.isLowSurrogate((char) (int) c2);
+                String read = validPair ? write : "??";
+                return new Object[] { write, read };
+            }))
+        )
+        .flatMap(o -> Stream.concat(
+            Stream.of(new Object[][] { o })
+        , Stream.concat(
+            Stream.of(new Object[][] {{ "x" + o[0], "x" + o[1] }} )
+        ,
+            Stream.of(new Object[][] {{ o[0] + "x", o[1] + "x" }} )
+        ))).toArray(size -> new Object[size][]);
+    }
+
+    @Test(dataProvider = "surrogatesParameters")
+    public void testSingleSurrogate(String writeValue, String readValue)
+            throws IOException {
+        writeAndReadManifestAndReadValue(writeValue, readValue);
+    }
+
+    @DataProvider(name = "nonCharactersParameters")
+    public static Object[][] nonCharactersParameters() {
+        List<String> nonCharacters =
+                IntStream.rangeClosed(0xFFFE, 0xFFFF) // suffix
+        .flatMap(s ->
+                IntStream.rangeClosed(0, 0x10) // prefix
+        .map(p -> p + s)).mapToObj(Character::toString)
+        .collect(Collectors.toList());
+        nonCharacters.add("x");
+        return Stream.concat(
+                nonCharacters.stream(), // single characters
+                nonCharacters.stream() // combinations of two
+                .flatMap(c1 -> nonCharacters.stream().map(c2 -> c1 + c2))
+        ).map(value -> new Object[] { value })
+        .toArray(size -> new Object[size][]);
+    }
+
+    /*
+     * This test may not relate too strongly to writing manifests but I thought
+     * that as well with surrogates, initially, which was wrong.
+     */
+    @Test(dataProvider = "nonCharactersParameters")
+    public void testNonCharacters(String value) throws IOException {
+        writeAndReadManifestAndReadValue(value, value);
+    }
+
+    /*
+     * This test may not relate too strongly to writing manifests but I thought
+     * that as well with surrogates, initially, which eventually was wrong.
+     */
+    @DataProvider(name = "specialCharactersParameters")
+    public static Object[][] specialCharactersParameters() {
+        String[] specialCharacters = new String[] {
+            // Control Characters - Break Inhibiting
+            "\u2011", // Non-breaking hyphen
+            "\u00A0", // No-break space
+            "\u202F", // Narrow no-break space
+            "\u2060xy\u2060", // Word Joiner
+            "\u200C", // Zero-Width Non-Joiner
+            "\u200D", // Zero-Width Joiner
+            // Control Characters - Break Enabling
+            "\u00AD", // Soft hyphen
+            "\u200B", // Zero-width space
+            // Spaces
+            "\u2000", // En Quad
+            "\u2001", // Em Quad
+            "\u2002", // En Space
+            "\u2003", // Em Space
+            "\u2004", // Three-Per-Em Space
+            "\u2005", // Four-Per-Em Space
+            "\u2006", // Six-Per-Em Space
+            "\u2007", // Figure Space
+            "\u2008", // Punctuation Space
+            "\u2009", // Thin Space
+            "\u200A", // Hair Space
+            "\u205F", // Medium Mathematical Space
+            // Separators
+            "\u2028", // Line Separator
+            "\u2029", // Paragraph Separator
+            // Byte Order Marks
+            "\uFEFF",
+            "\uFFFE",
+        };
+        return Stream.concat(
+                Arrays.stream(specialCharacters), // single characters alone
+                Arrays.stream(specialCharacters) // between two other characters
+                .map(c -> "x" + c + "y")
+        ).map(value -> new Object[] { value })
+        .toArray(size -> new Object[size][]);
+    }
+
+    @Test(dataProvider = "specialCharactersParameters")
+    public void testSpecialCharacters(String value) throws IOException {
+        writeAndReadManifestAndReadValue(value, value);
+    }
+
+}
diff -r 9d4956ec144e test/jdk/sun/security/tools/jarsigner/LineBrokenMultiByteCharacter.java
--- a/test/jdk/sun/security/tools/jarsigner/LineBrokenMultiByteCharacter.java	Fri Apr 10 16:21:12 2020 -0400
+++ b/test/jdk/sun/security/tools/jarsigner/LineBrokenMultiByteCharacter.java	Mon Apr 13 19:15:14 2020 +0200
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2017, 2019, Oracle and/or its affiliates. All rights reserved.
+ * Copyright (c) 2017, 2020, Oracle and/or its affiliates. All rights reserved.
  * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
  *
  * This code is free software; you can redistribute it and/or modify it
@@ -21,32 +21,49 @@
  * questions.
  */
 
-/*
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.Arrays;
+import java.util.stream.Collectors;
+import java.util.jar.Attributes.Name;
+import java.util.jar.Manifest;
+import java.util.jar.JarFile;
+import java.util.jar.JarEntry;
+import java.util.zip.ZipFile;
+import java.util.zip.ZipEntry;
+
+import static java.util.jar.JarFile.MANIFEST_NAME;
+import static java.nio.charset.StandardCharsets.UTF_8;
+import static java.nio.file.StandardCopyOption.COPY_ATTRIBUTES;
+import static java.nio.file.StandardCopyOption.REPLACE_EXISTING;
+
+import jdk.test.lib.SecurityTools;
+import jdk.test.lib.util.JarUtils;
+
+/**
  * @test
- * @bug 6695402
+ * @bug 6695402 6202130
  * @summary verify signatures of jars containing classes with names
  *          with multi-byte unicode characters broken across lines
  * @library /test/lib
  */
-
-import java.io.IOException;
-import java.io.InputStream;
-import java.nio.file.Files;
-import java.nio.file.Paths;
-import java.util.jar.JarFile;
-import java.util.jar.Attributes.Name;
-import java.util.jar.JarEntry;
-
-import static java.nio.charset.StandardCharsets.UTF_8;
-
-import jdk.test.lib.SecurityTools;
-import jdk.test.lib.util.JarUtils;
-
+/*
+ * Note to future maintainers:
+ * The case this test here (LineBrokenMultiByteCharacter) verifies has been
+ * addressed in bug 8217375 with a much more general approach than with bug
+ * 6695402 and bug 8217375 came with its own much broader set of tests so that
+ * now after resolution of bug 8217375, this test here might possibly not worth
+ * to maintain any further.
+ */
 public class LineBrokenMultiByteCharacter {
 
     /**
      * this name will break across lines in MANIFEST.MF at the
-     * middle of a two-byte utf-8 encoded character due to its e acute letter
+     * middle of a two-byte UTF-8 encoded character due to its e acute letter
      * at its exact position.
      *
      * because no file with such a name exists {@link JarUtils} will add the
@@ -63,53 +80,58 @@
     static final String anotherName =
             "LineBrokenMultiByteCharacterA1234567890B1234567890C123456789D1234567890.class";
 
-    static final String alias = "a";
-    static final String keystoreFileName = "test.jks";
-    static final String manifestFileName = "MANIFEST.MF";
+    static final String ALIAS = "A";
+    static final String KEYSTORE_FILENAME = "test.jks";
+    static final String SOME_OTHER_SIG_FILE = "META-INF/FAKE_SIG.DSA";
 
     public static void main(String[] args) throws Exception {
         prepare();
 
         testSignJar("test.jar");
-        testSignJarNoManifest("test-no-manifest.jar");
         testSignJarUpdate("test-update.jar", "test-updated.jar");
     }
 
     static void prepare() throws Exception {
-        SecurityTools.keytool("-keystore", keystoreFileName, "-genkeypair",
+        SecurityTools.keytool("-keystore", KEYSTORE_FILENAME, "-genkeypair",
                 "-keyalg", "dsa",
                 "-storepass", "changeit", "-keypass", "changeit", "-storetype",
-                "JKS", "-alias", alias, "-dname", "CN=X", "-validity", "366")
+                "JKS", "-alias", ALIAS, "-dname", "CN=X", "-validity", "366")
             .shouldHaveExitValue(0);
 
-        Files.write(Paths.get(manifestFileName), (Name.
+        new File(MANIFEST_NAME).getParentFile().mkdirs();
+        Files.write(Paths.get(MANIFEST_NAME), (Name.
                 MANIFEST_VERSION.toString() + ": 1.0\r\n").getBytes(UTF_8));
+
+        // prevent jarsigner from assuming it was safe to rewrite the manifest
+        // and its line breaks assuming there were no other signatures present
+        Files.write(Paths.get(SOME_OTHER_SIG_FILE), new byte[] {});
     }
 
     static void testSignJar(String jarFileName) throws Exception {
-        JarUtils.createJar(jarFileName, manifestFileName, testClassName);
-        verifyJarSignature(jarFileName);
-    }
-
-    static void testSignJarNoManifest(String jarFileName) throws Exception {
-        JarUtils.createJar(jarFileName, testClassName);
+        JarUtils.createJar(jarFileName, testClassName, SOME_OTHER_SIG_FILE);
+        createManifestEntries(jarFileName);
+        rebreakManifest72bytes(jarFileName);
         verifyJarSignature(jarFileName);
     }
 
     static void testSignJarUpdate(
             String initialFileName, String updatedFileName) throws Exception {
-        JarUtils.createJar(initialFileName, manifestFileName, anotherName);
-        SecurityTools.jarsigner("-keystore", keystoreFileName, "-storetype",
+        JarUtils.createJar(initialFileName, testClassName, anotherName,
+                SOME_OTHER_SIG_FILE);
+        createManifestEntries(initialFileName);
+        rebreakManifest72bytes(initialFileName);
+        removeJarEntry(initialFileName, testClassName);
+        SecurityTools.jarsigner("-keystore", KEYSTORE_FILENAME, "-storetype",
                 "JKS", "-storepass", "changeit", "-debug", initialFileName,
-                alias).shouldHaveExitValue(0);
+                ALIAS).shouldHaveExitValue(0);
         JarUtils.updateJar(initialFileName, updatedFileName, testClassName);
         verifyJarSignature(updatedFileName);
     }
 
     static void verifyJarSignature(String jarFileName) throws Exception {
         // actually sign the jar
-        SecurityTools.jarsigner("-keystore", keystoreFileName, "-storetype",
-                "JKS", "-storepass", "changeit", "-debug", jarFileName, alias)
+        SecurityTools.jarsigner("-keystore", KEYSTORE_FILENAME, "-storetype",
+                "JKS", "-storepass", "changeit", "-debug", jarFileName, ALIAS)
             .shouldHaveExitValue(0);
 
         try (
@@ -130,7 +152,7 @@
      * the signature file does not even contain the desired entry at all.
      *
      * this relies on {@link java.util.jar.Manifest} breaking lines unaware
-     * of bytes that belong to the same multi-byte utf characters.
+     * of bytes that belong to the same multi-byte UTF-8 encoded characters.
      */
     static void verifyClassNameLineBroken(JarFile jar, String className)
             throws IOException {
@@ -142,7 +164,7 @@
             throw new AssertionError(className + " not found in manifest");
         }
 
-        JarEntry manifestEntry = jar.getJarEntry(JarFile.MANIFEST_NAME);
+        JarEntry manifestEntry = jar.getJarEntry(MANIFEST_NAME);
         try (
             InputStream manifestIs = jar.getInputStream(manifestEntry);
         ) {
@@ -159,7 +181,7 @@
             }
             if (bytesMatched < eAcuteBroken.length) {
                 throw new AssertionError("self-test failed: multi-byte "
-                        + "utf-8 character not broken across lines");
+                        + "UTF-8 encoded character not broken across lines");
             }
         }
     }
@@ -183,4 +205,108 @@
         }
     }
 
+    static void createManifestEntries(String jarFileName) throws Exception {
+        JarUtils.updateJarFile(Paths.get(jarFileName),
+                Paths.get("."), Paths.get(MANIFEST_NAME));
+        SecurityTools.jarsigner("-keystore", KEYSTORE_FILENAME,
+                "-storepass", "changeit", "-debug", jarFileName, ALIAS)
+                .shouldHaveExitValue(0);
+        // remove the signature files, only manifest is used
+        removeJarEntry(jarFileName,
+               "META-INF/" + ALIAS + ".SF",
+               "META-INF/" + ALIAS + ".DSA");
+    }
+
+    @SuppressWarnings("deprecation")
+    static void removeJarEntry(String jarFileName, String... entryNames)
+            throws IOException {
+        String aCopy = "swap-" + jarFileName;
+        JarUtils.updateJar(jarFileName, aCopy, Arrays.asList(entryNames)
+                .stream().collect(Collectors.toMap(e -> e, e -> false)));
+        Files.copy(Paths.get(aCopy), Paths.get(jarFileName),
+                COPY_ATTRIBUTES, REPLACE_EXISTING);
+        Files.delete(Paths.get(aCopy));
+    }
+
+    static void rebreakManifest72bytes(String jarFileName) throws Exception {
+        byte[] manifest;
+        try (ZipFile zip = new ZipFile(jarFileName)) {
+            ZipEntry zipEntry = zip.getEntry(MANIFEST_NAME);
+            manifest = zip.getInputStream(zipEntry).readAllBytes();
+        }
+        Utils.echoManifest(manifest, MANIFEST_NAME + " before re-break:");
+        byte[] manifest72 = rebreak72bytes(manifest);
+        Utils.echoManifest(manifest72, MANIFEST_NAME + " after re-break:");
+        String aCopy = "swap-" + jarFileName;
+        JarUtils.updateManifest(jarFileName, aCopy, new Manifest() { @Override
+            public void write(OutputStream out) throws IOException {
+                out.write(manifest72);
+            }
+        });
+        Files.copy(Paths.get(aCopy), Paths.get(jarFileName),
+                COPY_ATTRIBUTES, REPLACE_EXISTING);
+        Files.delete(Paths.get(aCopy));
+    }
+
+    /**
+     * Simulates a jar manifest as it would have been created by an earlier
+     * JDK by re-arranging the line break at exactly 72 bytes content thereby
+     * breaking the multi-byte UTF-8 encoded character under test like before
+     * resolution of bug 6202130.
+     * <p>
+     * The returned manifest is accepted as unmodified by
+     * {@link jdk.security.jarsigner.JarSigner#updateDigests
+     * (ZipEntry,ZipFile,MessageDigest[],Manifest)} on line 985:
+     * <pre>
+     * if (!mfDigest.equalsIgnoreCase(base64Digests[i])) {
+     * </pre>
+     * and therefore left unchanged when the jar is signed and also signature
+     * verification will check it.
+     */
+    static byte[] rebreak72bytes(byte[] mf0) {
+        byte[] mf1 = new byte[mf0.length];
+        int c0 = 0, c1 = 0; // bytes since last line start
+        for (int i0 = 0, i1 = 0; i0 < mf0.length; i0++, i1++) {
+            switch (mf0[i0]) {
+            case '\r':
+                if (i0 + 2 < mf0.length &&
+                        mf0[i0 + 1] == '\n' && mf0[i0 + 2] == ' ') {
+                    // skip line break
+                    i0 += 2;
+                    i1 -= 1;
+                } else {
+                    mf1[i1] = mf0[i0];
+                    c0 = c1 = 0;
+                }
+                break;
+            case '\n':
+                if (i0 + 1 < mf0.length && mf0[i0 + 1] == ' ') {
+                    // skip line break
+                    i0 += 1;
+                    i1 -= 1;
+                } else {
+                    mf1[i1] = mf0[i0];
+                    c0 = c1 = 0;
+                }
+                break;
+            case ' ':
+                if (c0 == 0) {
+                    continue;
+                }
+            default:
+                c0++;
+                if (c1 == 72) {
+                    mf1[i1++] = '\r';
+                    mf1[i1++] = '\n';
+                    mf1[i1++] = ' ';
+                    c1 = 1;
+                } else {
+                    c1++;
+                }
+                mf1[i1] = mf0[i0];
+            }
+        }
+        return mf1;
+    }
+
 }

Re: 6202130: Need to handle UTF-8 values and break up lines longer than 72 bytes

Reply via email to