Re: RFR for 6443578 and 6202130: UTF-8 in Manifests

Xueming Shen Wed, 02 May 2018 23:58:19 -0700

Philipp,

I kinda recalled JDK-6202130 which I read long time ago and I believedit's not a bug butjust wanted to leave it open and for double check, then it's off thepriority list somehow...

Reread the code below I think it handles utf8 and the 72-length-limitappropriately, though

a little tricky,

(1) value.getByte("utf8") + deprecated String(byte[], int, int, int);

which creates a String that each "char" inside that String objectactually represents one bytevalue of the resulted utf8 byte sequence, with its utf16 value =[hi-byte= 0 and low-byte="utf8-byte"]

(2) append this String to the StringBuffer (--> StringBuilder), now thesb.length() actually is thelength of the utf8 byte sequence, make72Safe(...) is used toadjust the length to 72 working

     on "chars" inside the buffer.

(3) write out the adjusted buffer via DataOutputStream.writeBytes()

in which the "write" cuts off the hi-byte of that utf16, so youactually get the original

     utf-8 byte sequence output to the stream.

Sure the implementation looks "interesting" and uses deprecated Stringconstructor, but it was

written 2 decades ago.

Back then I think we were thinking maybe the fix is to simply remove the"misleading" commentline below in the source code, if the implementation actually supportsutf-8 + 72-limit.

" * XXX Need to handle UTF8 values and break up lines longer than 72bytes"


Any reason/test case to believe the mechanism does not work?

BTW the logic of line below in writeMain() appears to be incorrect. I dohave another bug for it.

"            if ((version != null) && !(name.equalsIgnoreCase(vername))) {"

----------------------

                StringBuffer buffer = new StringBuffer(name);
                buffer.append(": ");

                String value = (String) e.getValue();
                if (value != null) {
                    byte[] vb = value.getBytes("UTF8");
                    value = new String(vb, 0, 0, vb.length);
                }
                buffer.append(value);

                Manifest.make72Safe(buffer);
                buffer.append("\r\n");
                out.writeBytes(buffer.toString());

-------------------------

sherman



On 5/1/18, 10:21 PM, Philipp Kunz wrote:

Hi,

Recently, I tried to fix only bug 6202130 with the intention to fixbug 6443578 later with the intention to get some opportunity forfeedback, but haven't got any, and propose now a fix for both togetherwhich in my opinion makes more sense.


See attached patch.

Some considerations, assumptions, and explanations

 * In my opinion, the code for writing manifests was distributed in the
   two classes Attributes and Manifest in an elegant way but somewhat
   difficult to explain the coherence. I chose to group the code that
   writes manifests into a new class ManifestWriter. The main incentive
   for that was to prevent or reduce duplicated code I would have had
   to change twice otherwise. This also results in a source file of a
   suitable size.
 * I could not support the assumption that the write and writeMain
   methods in Attributes couldn't be referenced anywhere so I
   deprecated them rather than having them removed.
 * I assumed the patch will not make it into JDK 10 and, hence, the
   deprecated annotations are attributed with since = 11.
 * I could not figure out any reason for the use of DataOutputStream
   and did not use it.
 * Performance-wise I assume that the code is approximately comparable
   to the previous version. The biggest improvement in this respect I
   hope comes from removing the String that contains the byte array
   constructed with deprecated String(byte[], int, int, int) and then
   copying it over again to a StringBuffer and from there to a String
   again and then Characters. On the other hand, keeping whole
   characters together when breaking lines might make it slightly
   slower. I hope my changes are an overall improvement, but I haven't
   measured it.
 * For telling first from continuation bytes of utf-8 characters apart
   I re-used a method isNotUtfContinuationByte from either StringCoding
   or UTF_8.Decoder. Unfortunately I found no way not to duplicate it.
 * Where it said before "XXX Need to handle UTF8 values and break up
   lines longer than 72 bytes" in Attributes#writeMain I did not dare
   to remove the comment completely because it still does not deal
   correctly with version headers longer than 72 bytes and the set of
   allowed values. I changed it accordingly. Two similar comments are
   removed in the patch.
 * I added two tests, WriteDeprecated and NullKeysAndValues, to
   demonstrate compatibility as good as I could. Might however not be
   desired to keep and having to maintain.
 * LineBrokenMultiByteCharacter for jarsigner should not be removed or
   not so immediately because someone might attempt to sign an older
   jarfile created without that patch with a newer jarsigner that
   already contains it.

suggested changes or additions to the bug database: (i have nopermissions to edit it myself)


 * Re-combine copies of isNotUtfContinuationByte (three by now).
   Relates to 6184334. Worth to file another issue?
 * Manifest versions have specific specifications, cannot break across
   lines and can contain a subset of characters only. Bug 6910466
   relates but is not exactly the same. If someone else is convinced
   that writing a manifest should issue a warning or any other way to
   deal with a version that does not conform to the specification, I'd
   suggest to create a separate bug for that.

Now, I would be glad if someone sponsored a review. This is only mythird attempt to submit a patch which is why I chose a lesserimportant subject to fix in order to get familiar and now I understandit's not the most attractive patch to review. Please don't hesitate tosuggest what I could do better or differently.

As a bonus, with these changes, manifest files will always bedisplayed correctly with just any utf capable viewer even if theycontain multi-byte utf characters that would have been broken across aline break with the current/previous implementation and all manifestswill become also valid strings in Java.

Re: RFR for 6443578 and 6202130: UTF-8 in Manifests

Reply via email to