Thanks to Sherman's hint, I revised the test to use the terms unicode
character and utf encoding consistently and not utf character. Affects
only the test, mostly in comments and a few identifiers.
Philipp
On Wed, 2018-10-03 at 12:48 -0700, Xueming Shen wrote:
> +1
>
> only nit is it might be preferred to use "utf8" directly in the test
> instead of current "utf"
>
> -sherman
>
> On 10/03/2018 11:55 AM, Philipp Kunz wrote:
> > In short, bug 6202130 is about removal of comments in Attributes
> > like
> >
> > > XXX Need to handle UTF8 values and break up lines longer than 72
> >
> > The patch here contains a test that shows that utf characters are
> > handled correctly, which has been the case before. Breaking lines
> > has
> > also been working before.
> >
> > In the main code, the patch only removes a few comments and does
> > not
> > change actual code. It all has been working before and now we know.
> >
> > Philipp
> >
> >
> > https://bugs.openjdk.java.net/browse/JDK-6202130
> >
> >
> >
> >
> > On Wed, 2018-09-26 at 07:08 +0200, Philipp Kunz wrote:
> > > Hi,
> > >
> > > There are some comments in the source code like this:
> > >
> > > > XXX Need to handle UTF8 values and break up lines longer
> > > than 72 bytes
> > >
> > > in Attributes.java on lines 299, 327, and 370. Bug 6202130 also
> > > suggests
> > > to add a test to show that the character set is fully supported
> > > before
> > > actually removing those comments.
> > >
> > > Hence, I added such a test, see ValueUtfEncoding in attached
> > > patch, and
> > > it finds the complete utf character set supported correctly and
> > > that it
> > > has been so already before.
> > >
> > > The comments in Attributes also say that lines longer than 72
> > > bytes need
> > > to be broken. That is a different issue and has been already
> > > addressed
> > > and solved in bug 6372077 and is now tested in LineBreakLineWidth
> > > and
> > > therefore, that line break part of the comment is obsolete now as
> > > well
> > > with the exception of version headers ("Manifest-Version" or
> > > "Signature-Version") at least when taking strictly.
> > >
> > > Version headers are not subject of bug 6202130 for which my patch
> > > intends to provide a fix and if only because multi-byte
> > > characters are
> > > not allowed in version headers and wasn't subject of 6372077
> > > either
> > > which was about too short lines rather than too long ones. If I
> > > would
> > > change only the minimum I could argue was safe, there would have
> > > to
> > > remain a comment in Attributes.java on line 327 like
> > >
> > > > XXX Need to handle version headers longer than 72 bytes
> > >
> > > but I guess that should probably better be represented as another
> > > bug
> > > rather than this style of comment if at all desired. There are
> > > some bugs
> > > concerning version headers and particularly the behavior of the
> > > manifests in their absence, among possibly others, 6910466,
> > > 4935610,
> > > 4271239, 8196371 and 4256580 which have some relation but I
> > > haven't
> > > found one that really fits the 72 byte line length limit of
> > > version
> > > headers. Whether such a bug is created or found or not, it does
> > > not
> > > directly relate to the one I'm trying to fix here, 6202130.
> > >
> > > If also the test is good I would assume that the comments can be
> > > removed now without actually changing the Manifest and Attributes
> > > implementation.
> > >
> > > Regards,
> > > Philipp
> diff -r d63efc278e93 src/java.base/share/classes/java/util/jar/Attributes.java
--- a/src/java.base/share/classes/java/util/jar/Attributes.java Thu Oct 04 10:19:01 2018 -0700
+++ b/src/java.base/share/classes/java/util/jar/Attributes.java Thu Oct 04 21:43:09 2018 +0200
@@ -296,7 +296,6 @@
/*
* Writes the current attributes to the specified data output stream.
- * XXX Need to handle UTF8 values and break up lines longer than 72 bytes
*/
@SuppressWarnings("deprecation")
void write(DataOutputStream os) throws IOException {
@@ -323,8 +322,6 @@
* Writes the current attributes to the specified data output stream,
* make sure to write out the MANIFEST_VERSION or SIGNATURE_VERSION
* attributes first.
- *
- * XXX Need to handle UTF8 values and break up lines longer than 72 bytes
*/
@SuppressWarnings("deprecation")
void writeMain(DataOutputStream out) throws IOException
@@ -367,7 +364,6 @@
/*
* Reads attributes from the specified input stream.
- * XXX Need to handle UTF8 values.
*/
void read(Manifest.FastInputStream is, byte[] lbuf) throws IOException {
read(is, lbuf, null, 0);
diff -r d63efc278e93 test/jdk/java/util/jar/Manifest/ValueUtfEncoding.java
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test/jdk/java/util/jar/Manifest/ValueUtfEncoding.java Thu Oct 04 21:43:09 2018 +0200
@@ -0,0 +1,194 @@
+/*
+ * Copyright (c) 2018, Oracle and/or its affiliates. All rights reserved.
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This code is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 only, as
+ * published by the Free Software Foundation.
+ *
+ * This code is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * version 2 for more details (a copy is included in the LICENSE file that
+ * accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License version
+ * 2 along with this work; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
+ * or visit www.oracle.com if you need additional information or have any
+ * questions.
+ */
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.util.jar.Attributes;
+import java.util.jar.Attributes.Name;
+import java.util.jar.Manifest;
+import java.util.List;
+import java.util.ArrayList;
+
+import org.testng.annotations.Test;
+import static org.testng.Assert.*;
+
+/**
+ * @test
+ * @bug 6202130
+ * @run testng ValueUtfEncoding
+ * @summary Tests complete uncode character set manifest values encoding
+ * <p>
+ * This test writes and reads a manifest that contains every valid unicode
+ * character, grouped into manifest header values with about as many
+ * bytes as allowed each, resulting in a single big manifest with.
+ * This way, all possible 1111995 unicode characters are covered in one
+ * manifest.
+ * <p>
+ * Only header values are tested. Characters for header names are much more
+ * limited and very simple ones are used just to get valid and different ones.
+ */
+public class ValueUtfEncoding {
+
+ /**
+ * From the specifications:
+ * <q>Implementations should support 65535-byte (not character) header
+ * values, and 65535 headers per file. They might run out of memory,
+ * but there should not be hard-coded limits below these values.</q>
+ *
+ * @see <a href="{@docRoot}/../specs/jar/jar.html#Notes_on_Manifest_and_Signature_Files">Notes on Manifest and Signature Files</a>
+ */
+ static final int SUPPORTED_VALUE_LENGTH = 2 << 16 - 1;
+
+ /**
+ * returns {@code true} if {@code codePoint} is explicitly forbidden in
+ * manifest values based on a statement from the specs:
+ * <pre>otherchar: any UTF-8 character except NUL, CR and LF<pre>
+ *
+ * @see <a href="{@docRoot}/../specs/jar/jar.html#Section-Specification">Jar File Specification</a>
+ */
+ static boolean isInvalidManifestValueCharacter(int codePoint) {
+ return codePoint == 0 /* NUL */
+ || codePoint == '\r' /* CR */
+ || codePoint == '\n' /* LF */;
+ };
+
+ /**
+ * Produces a list of strings with all known unicode characters except
+ * those invalid in manifest header values.
+ * Each string is filled with as many characters as fit with utf-8
+ * encoding into {@link #SUPPORTED_VALUE_LENGTH} bytes except the
+ * last string which contains just the remaining characters.
+ */
+ static List<String> produceValidManifestUnicodeCharacterValues() {
+ ArrayList<String> values = new ArrayList<>();
+ byte[] valueBuf = new byte[SUPPORTED_VALUE_LENGTH];
+ int pos = 0;
+ for (int codePoint = Character.MIN_CODE_POINT;
+ codePoint <= Character.MAX_CODE_POINT; codePoint++) {
+ if (isInvalidManifestValueCharacter(codePoint)
+ || !Character.isDefined(codePoint)) {
+ continue;
+ }
+
+ byte[] charBuf =
+ new String(Character.toChars(codePoint)).getBytes(UTF_8);
+ if (pos + charBuf.length > valueBuf.length) {
+ values.add(new String(valueBuf, 0, pos, UTF_8));
+ pos = 0;
+ }
+ System.arraycopy(charBuf, 0, valueBuf, pos, charBuf.length);
+ pos += charBuf.length;
+ }
+ if (pos > 0) {
+ values.add(new String(valueBuf, 0, pos, UTF_8));
+ }
+ return values;
+ }
+
+ /**
+ * returns simple, valid, short, and distinct manifest header names.
+ * The returned name cannot be "{@code Manifest-Version}" because the
+ * returned string does not contain "{@code -}".
+ *
+ * @param seed seed to produce distinct names
+ */
+ static String azName(int seed) {
+ StringBuffer name = new StringBuffer();
+ do {
+ name.insert(0, (char) (seed % 26 + (seed < 26 ? 'A' : 'a')));
+ seed = seed / 26 - 1;
+ } while (seed >= 0);
+ return name.toString();
+ }
+
+ /**
+ * covers writing and reading of manifests with all known unicode
+ * characters.
+ *
+ * Because the implementation uses different portions of code depending on
+ * where the value occurs to read or write, each character is tested in
+ * each of the three positions:<ul>
+ * <li>main attribute header,</li>
+ * <li>named section name, and</li>
+ * <li>named sections header values</li>
+ * </ul>
+ * Implementation of writing the main section headers in
+ * {@link Attributes#writeMain(java.io.DataOutputStream)} differs from the
+ * one writing named section headers in
+ * {@link Attributes#write(java.io.DataOutputStream)} regarding the special
+ * order of {@link Name#MANIFEST_VERSION} and
+ * {@link Name#SIGNATURE_VERSION} and also
+ * {@link Manifest#read(java.io.InputStream)} reads main sections
+ * different than reading named sections names headers in
+ * {@link Attributes#read(Manifest.FastInputStream, byte[])}.
+ */
+ @Test
+ public void testValueUtfEncoding() throws IOException {
+ Manifest mf = new Manifest();
+ mf.getMainAttributes().put(Name.MANIFEST_VERSION, "1.0");
+
+ List<String> values = produceValidManifestUnicodeCharacterValues();
+ for (int i = 0; i < values.size(); i++) {
+ String name = azName(i);
+ String value = values.get(i);
+
+ mf.getMainAttributes().put(new Name(name), value);
+ Attributes attributes = new Attributes();
+ mf.getEntries().put(value, attributes);
+ attributes.put(new Name(name), value);
+ }
+
+ mf = writeAndRead(mf);
+
+ for (int i = 0; i < values.size(); i++) {
+ String value = values.get(i);
+ String name = azName(i);
+
+ assertEquals(mf.getMainAttributes().getValue(name), value,
+ "main attributes header value");
+ Attributes attributes = mf.getAttributes(value);
+ assertNotNull(attributes, "named section not found");
+ assertEquals(attributes.getValue(name), value,
+ "named section attributes value");
+ }
+ }
+
+ static Manifest writeAndRead(Manifest mf) throws IOException {
+ ByteArrayOutputStream out = new ByteArrayOutputStream();
+ mf.write(out);
+ byte[] mfBytes = out.toByteArray();
+
+ final String sepLine = "----------------------------------------------"
+ + "--------------------------";
+ System.out.println(sepLine);
+ System.out.print(new String(mfBytes, UTF_8));
+ System.out.println(sepLine);
+
+ ByteArrayInputStream in = new ByteArrayInputStream(mfBytes);
+ return new Manifest(in);
+ }
+
+}