Please find another revised and updated patch attached.
On Sun, 2018-10-21 at 15:10 -0700, Martin Buchholz wrote: > I only took a quick look. > Looks good, but here's a nitpick - capitalize javadoc that begins > with "returns" > On Fri, Oct 19, 2018 at 1:27 AM, Philipp Kunz <philipp.kunz@paratix.c > h> wrote: > > Hi Martin and everyone, > > > > > > > > You were absolutely right to object "utf". > > > > Please find a revised and updated patch attached. > > > > > > > > Regards, > > > > Philipp > > > > > > > > > > > > > > > > On Thu, 2018-10-04 at 15:24 -0700, Martin Buchholz wrote: > > > > > "utf8" is a thing. "utf" is not. > > > > > > > > > > On Thu, Oct 4, 2018 at 12:58 PM, Philipp Kunz <philipp.kunz@parat > > ix.c > > > > > h> wrote: > > > > > > Thanks to Sherman's hint, I revised the test to use the terms > > > > > > unicode > > > > > > character and utf encoding consistently and not utf character. > > > > > > Affects > > > > > > only the test, mostly in comments and a few identifiers. > > > > > > > > > > > > Philipp > > > > > > > > > > > > > > > > > > On Wed, 2018-10-03 at 12:48 -0700, Xueming Shen wrote: > > > > > > > +1 > > > > > > > > > > > > > > only nit is it might be preferred to use "utf8" directly in > > the > > > > > > > test > > > > > > > instead of current "utf" > > > > > > > > > > > > > > -sherman > > > > > > > > > > > > > > On 10/03/2018 11:55 AM, Philipp Kunz wrote: > > > > > > > > In short, bug 6202130 is about removal of comments in > > > > > > > > Attributes > > > > > > > > like > > > > > > > > > > > > > > > > > XXX Need to handle UTF8 values and break up lines longer > > than > > > > > > > > > 72 > > > > > > > > > > > > > > > > The patch here contains a test that shows that utf > > characters > > > > > > > > are > > > > > > > > handled correctly, which has been the case before. Breaking > > > > > > > > lines > > > > > > > > has > > > > > > > > also been working before. > > > > > > > > > > > > > > > > In the main code, the patch only removes a few comments and > > > > > > > > does > > > > > > > > not > > > > > > > > change actual code. It all has been working before and now > > we > > > > > > > > know. > > > > > > > > > > > > > > > > Philipp > > > > > > > > > > > > > > > > > > > > > > > > https://bugs.openjdk.java.net/browse/JDK-6202130 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, 2018-09-26 at 07:08 +0200, Philipp Kunz wrote: > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > There are some comments in the source code like this: > > > > > > > > > > > > > > > > > > > XXX Need to handle UTF8 values and break up lines > > longer > > > > > > > > > than 72 bytes > > > > > > > > > > > > > > > > > > in Attributes.java on lines 299, 327, and 370. Bug > > 6202130 > > > > > > > > > also > > > > > > > > > suggests > > > > > > > > > to add a test to show that the character set is fully > > > > > > > > > supported > > > > > > > > > before > > > > > > > > > actually removing those comments. > > > > > > > > > > > > > > > > > > Hence, I added such a test, see ValueUtfEncoding in > > attached > > > > > > > > > patch, and > > > > > > > > > it finds the complete utf character set supported > > correctly > > > > > > > > > and > > > > > > > > > that it > > > > > > > > > has been so already before. > > > > > > > > > > > > > > > > > > The comments in Attributes also say that lines longer > > than 72 > > > > > > > > > bytes need > > > > > > > > > to be broken. That is a different issue and has been > > already > > > > > > > > > addressed > > > > > > > > > and solved in bug 6372077 and is now tested in > > > > > > > > > LineBreakLineWidth > > > > > > > > > and > > > > > > > > > therefore, that line break part of the comment is > > obsolete > > > > > > > > > now as > > > > > > > > > well > > > > > > > > > with the exception of version headers ("Manifest-Version" > > or > > > > > > > > > "Signature-Version") at least when taking strictly. > > > > > > > > > > > > > > > > > > Version headers are not subject of bug 6202130 for which > > my > > > > > > > > > patch > > > > > > > > > intends to provide a fix and if only because multi-byte > > > > > > > > > characters are > > > > > > > > > not allowed in version headers and wasn't subject of > > 6372077 > > > > > > > > > either > > > > > > > > > which was about too short lines rather than too long > > ones. If > > > > > > > > > I > > > > > > > > > would > > > > > > > > > change only the minimum I could argue was safe, there > > would > > > > > > > > > have > > > > > > > > > to > > > > > > > > > remain a comment in Attributes.java on line 327 like > > > > > > > > > > > > > > > > > > > XXX Need to handle version headers longer than 72 > > bytes > > > > > > > > > > > > > > > > > > but I guess that should probably better be represented as > > > > > > > > > another > > > > > > > > > bug > > > > > > > > > rather than this style of comment if at all desired. > > There > > > > > > > > > are > > > > > > > > > some bugs > > > > > > > > > concerning version headers and particularly the behavior > > of > > > > > > > > > the > > > > > > > > > manifests in their absence, among possibly others, > > 6910466, > > > > > > > > > 4935610, > > > > > > > > > 4271239, 8196371 and 4256580 which have some relation but > > I > > > > > > > > > haven't > > > > > > > > > found one that really fits the 72 byte line length limit > > of > > > > > > > > > version > > > > > > > > > headers. Whether such a bug is created or found or not, > > it > > > > > > > > > does > > > > > > > > > not > > > > > > > > > directly relate to the one I'm trying to fix here, > > 6202130. > > > > > > > > > > > > > > > > > > If also the test is good I would assume that the comments > > can > > > > > > > > > be > > > > > > > > > removed now without actually changing the Manifest and > > > > > > > > > Attributes > > > > > > > > > implementation. > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > Philipp > >
diff -r 5e3a8f387701 src/java.base/share/classes/java/util/jar/Attributes.java --- a/src/java.base/share/classes/java/util/jar/Attributes.java Mon Oct 22 13:31:42 2018 -0700 +++ b/src/java.base/share/classes/java/util/jar/Attributes.java Mon Oct 22 23:54:42 2018 +0200 @@ -296,7 +296,6 @@ /* * Writes the current attributes to the specified data output stream. - * XXX Need to handle UTF8 values and break up lines longer than 72 bytes */ @SuppressWarnings("deprecation") void write(DataOutputStream os) throws IOException { @@ -323,8 +322,6 @@ * Writes the current attributes to the specified data output stream, * make sure to write out the MANIFEST_VERSION or SIGNATURE_VERSION * attributes first. - * - * XXX Need to handle UTF8 values and break up lines longer than 72 bytes */ @SuppressWarnings("deprecation") void writeMain(DataOutputStream out) throws IOException @@ -367,7 +364,6 @@ /* * Reads attributes from the specified input stream. - * XXX Need to handle UTF8 values. */ void read(Manifest.FastInputStream is, byte[] lbuf) throws IOException { read(is, lbuf, null, 0); diff -r 5e3a8f387701 test/jdk/java/util/jar/Manifest/ValueUtf8Coding.java --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test/jdk/java/util/jar/Manifest/ValueUtf8Coding.java Mon Oct 22 23:54:42 2018 +0200 @@ -0,0 +1,202 @@ +/* + * Copyright (c) 2018, Oracle and/or its affiliates. All rights reserved. + * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. + * + * This code is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 only, as + * published by the Free Software Foundation. + * + * This code is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License + * version 2 for more details (a copy is included in the LICENSE file that + * accompanied this code). + * + * You should have received a copy of the GNU General Public License version + * 2 along with this work; if not, write to the Free Software Foundation, + * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. + * + * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA + * or visit www.oracle.com if you need additional information or have any + * questions. + */ + +import static java.nio.charset.StandardCharsets.UTF_8; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.util.jar.Attributes; +import java.util.jar.Attributes.Name; +import java.util.jar.Manifest; +import java.util.List; +import java.util.ArrayList; + +import org.testng.annotations.Test; +import static org.testng.Assert.*; + +/** + * @test + * @bug 6202130 + * @run testng ValueUtf8Coding + * @summary Tests that UTF-8 encoding and decoding manifest header values + * works well with the complete unicode character set. + */ +public class ValueUtf8Coding { + + /** + * Maximum number of bytes of UTF-8 encoded characters in one header value. + * <p> + * There are too many unicode characters (code points) to fit all into one + * manifest value. The specifications state: + * <q>Implementations should support 65535-byte (not character) header + * values, and 65535 headers per file. They might run out of memory, + * but there should not be hard-coded limits below these values.</q> + * + * @see <a + * href="{@docRoot}/../specs/jar/jar.html#Notes_on_Manifest_and_Signature_Files"> + * Notes on Manifest and Signature Files</a> + */ + static final int SUPPORTED_VALUE_LENGTH = 65535; + + /** + * Returns {@code true} if {@code codePoint} is known not to be a supported + * character in manifest header values. Explicitly forbidden in manifest + * header values are according to a statement from the specifications: + * <q>otherchar: any UTF-8 character except NUL, CR and LF</q>. + * {@code NUL} ({@code 0x0}), however, works just fine and might have been + * used and might still be. + * + * @see <a href="{@docRoot}/../specs/jar/jar.html#Section-Specification"> + * Jar File Specification</a> + */ + static boolean isUnsupportedManifestValueCharacter(int codePoint) { + return codePoint == '\r' /* CR */ || codePoint == '\n' /* LF */; + }; + + /** + * Produces a list of strings with all unicode characters except those + * explicitly invalid in manifest header values. + * Each string is filled with as many characters as fit into + * {@link #SUPPORTED_VALUE_LENGTH} bytes with UTF-8 encoding except the + * last string which contains the remaining characters. + */ + static List<String> produceUnicodeCharacterValues() { + ArrayList<String> values = new ArrayList<>(); + byte[] valueBuf = new byte[SUPPORTED_VALUE_LENGTH]; + int pos = 0; + for (int codePoint = Character.MIN_CODE_POINT; + codePoint <= Character.MAX_CODE_POINT; codePoint++) { + if (isUnsupportedManifestValueCharacter(codePoint)) { + continue; + } + + byte[] charBuf = Character.toString(codePoint).getBytes(UTF_8); + if (pos + charBuf.length > valueBuf.length) { + values.add(new String(valueBuf, 0, pos, UTF_8)); + pos = 0; + } + System.arraycopy(charBuf, 0, valueBuf, pos, charBuf.length); + pos += charBuf.length; + } + if (pos > 0) { + values.add(new String(valueBuf, 0, pos, UTF_8)); + } + return values; + } + + /** + * Returns simple, valid, short, and distinct manifest header names. + * The returned name cannot collide with "{@code Manifest-Version}" because + * the returned string does not contain "{@code -}". + */ + static Name azName(int seed) { + StringBuffer name = new StringBuffer(); + do { + name.insert(0, (char) (seed % 26 + (seed < 26 ? 'A' : 'a'))); + seed = seed / 26 - 1; + } while (seed >= 0); + return new Name(name.toString()); + } + + /** + * Writes and reads a manifest with the complete unicode character set. + * The characters are grouped into manifest header values with about as + * many bytes as allowed each, utilizing a single big manifest. + * <p> + * This test assumes that a manifest is encoded and decoded correctly if + * writing and then reading it again results in a manifest with identical + * values as the original. The test is not about other aspects of writing + * and reading manifests than only that, given the fact and the way it + * works for some characters such as the most widely and often used ones, + * it also works for the complete unicode character set just the same. + * <p> + * Only header values are tested. The set of allowed characters for header + * names are much more limited and are a different topic entirely and most + * simple ones are used here as necessary just to get valid and different + * ones (see {@link #azName}). + * <p> + * Because the current implementation under test uses different portions + * of code depending on where the value occurs to read or write, each + * character is tested in each of the three positions:<ul> + * <li>main attribute header,</li> + * <li>named section name, and</li> + * <li>named sections header values</li> + * </ul> + * Implementation of writing the main section headers in + * {@link Attributes#writeMain(java.io.DataOutputStream)} differs from the + * one writing named section headers in + * {@link Attributes#write(java.io.DataOutputStream)} regarding the special + * order of {@link Name#MANIFEST_VERSION} and + * {@link Name#SIGNATURE_VERSION} and also + * {@link Manifest#read(java.io.InputStream)} at least potentially reads + * main sections differently than reading named sections names headers in + * {@link Attributes#read(Manifest.FastInputStream, byte[])}. + */ + @Test + public void testUnicodeCharacterSet() throws IOException { + Manifest mf = new Manifest(); + mf.getMainAttributes().put(Name.MANIFEST_VERSION, "1.0"); + + List<String> values = produceUnicodeCharacterValues(); + for (int i = 0; i < values.size(); i++) { + Name name = azName(i); + String value = values.get(i); + + mf.getMainAttributes().put(name, value); + Attributes attributes = new Attributes(); + mf.getEntries().put(value, attributes); + attributes.put(name, value); + } + + mf = writeAndRead(mf); + + for (int i = 0; i < values.size(); i++) { + String value = values.get(i); + Name name = azName(i); + + assertEquals(mf.getMainAttributes().getValue(name), value, + "main attributes header value"); + Attributes attributes = mf.getAttributes(value); + assertNotNull(attributes, "named section"); + assertEquals(attributes.getValue(name), value, + "named section attributes value"); + } + } + + static Manifest writeAndRead(Manifest mf) throws IOException { + ByteArrayOutputStream out = new ByteArrayOutputStream(); + mf.write(out); + byte[] mfBytes = out.toByteArray(); + + String sepLine = "----------------------------------------------" + + "--------------------------"; + System.out.println(sepLine); + System.out.print(new String(mfBytes, UTF_8)); + System.out.println(sepLine); + + ByteArrayInputStream in = new ByteArrayInputStream(mfBytes); + return new Manifest(in); + } + +}