Hi Raffaello, I think the changes look reasonable.
I have created a webrev, http://cr.openjdk.java.net/~lancea/8222187/webrev.00/ <http://cr.openjdk.java.net/~lancea/8222187/webrev.00/>, for others to review as it is a bit easier than the patch. I have also run the JCK tests in this area as well as mach5 tiers1-3 (which I believe Roger has also) Best Lance > On Jul 15, 2020, at 5:44 PM, Raffaello Giulietti > <raffaello.giulie...@gmail.com> wrote: > > Hi Roger, > > On 2020-07-15 22:21, Roger Riggs wrote: >> Hi Raffaello, >> Base64.java: >> 2: Please update 2nd copyright year to 2020 > > I was unsure what to do here, thanks for guidance. > I also removed the seemingly useless import at former L33. > > > >> leftovers(): >> 996: "&" the shortcutting && is more typical in a condition. >> 997: the -= is buried in an expression and a reader might miss it. > > Corrected, as well as the analogous -= usage for wpos now at L1106-7 > > >> 1053: "can be reallocated to other vars": not a useful comment, reflecting >> on only one possible implementation that is out of the control of the >> developer. >> I'd almost rather see 'len' than 'limit - off'; it might be informative to >> the reader if 'limit' was declared final. (1056:) > > Done, as well as declaring i and v final in the loop. > > > >> TestBase54.java: >> 2: Update 2nd copyright year to 2020 >> 27: Please add the bug number to the @bug line. >> Style-wise, I would remove the blank lines at the beginning of blocks. >> (1095, 1115) > > Done. > > >> Thanks, Roger >> On 7/14/20 11:47 AM, Raffaello Giulietti wrote: >>> Hi Roger, >>> >>> here's the latest version of the patch. >>> >>> I did: >>> * Withdraw the simplification in encodedOutLength(), as it is indeed out of >>> scope for this bug fix. >>> * Restore field names in DecInputStream except for nextin (now wpos) and >>> nextout (now rpos) because they have slightly different semantics. That's >>> just for clarity. I would have nothing against reusing the old names for >>> the proposed purpose. >>> * Switch back to the original no-arg read() implementation. >>> * Adjust comment continuation lines. >>> * Preserve the proposed logic of read(byte[], int, int) and the supporting >>> methods. >>> >>> Others needed changes are: >>> * Correct the copyright years: that's something better left to Oracle. >>> * Remove the apparently useless import at L33. I could build and run >>> without it. >>> >>> >>> Greetings >>> Raffaello >>> > > > ---- > > # HG changeset patch > # User lello > # Date 1594848775 -7200 > # Wed Jul 15 23:32:55 2020 +0200 > # Node ID f7407f35e83ab508f0e6deab4f776bb1a1c85e68 > # Parent a5649ebf94193770ca763391dd63807059d333b4 > 8222187: java.util.Base64.Decoder stream adds unexpected null bytes at the end > Reviewed-by: TBD > Contributed-by: Raffaello Giulietti <raffaello.giulie...@gmail.com > <mailto:raffaello.giulie...@gmail.com>> > > diff --git a/src/java.base/share/classes/java/util/Base64.java > b/src/java.base/share/classes/java/util/Base64.java > --- a/src/java.base/share/classes/java/util/Base64.java > +++ b/src/java.base/share/classes/java/util/Base64.java > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 2012, 2019, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 2012, 2020, Oracle and/or its affiliates. All rights > reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -30,7 +30,6 @@ > import java.io.IOException; > import java.io.OutputStream; > import java.nio.ByteBuffer; > -import java.nio.charset.StandardCharsets; > > import sun.nio.cs.ISO_8859_1; > > @@ -961,12 +960,15 @@ > > private final InputStream is; > private final boolean isMIME; > - private final int[] base64; // base64 -> byte mapping > - private int bits = 0; // 24-bit buffer for decoding > - private int nextin = 18; // next available "off" in "bits" > for input; > - // -> 18, 12, 6, 0 > - private int nextout = -8; // next available "off" in "bits" > for output; > - // -> 8, 0, -8 (no byte for output) > + private final int[] base64; // base64 -> byte mapping > + private int bits = 0; // 24-bit buffer for decoding > + > + /* writing bit pos inside bits; one of 24 (left, msb), 18, 12, 6, 0 > */ > + private int wpos = 0; > + > + /* reading bit pos inside bits: one of 24 (left, msb), 16, 8, 0 */ > + private int rpos = 0; > + > private boolean eof = false; > private boolean closed = false; > > @@ -983,107 +985,153 @@ > return read(sbBuf, 0, 1) == -1 ? -1 : sbBuf[0] & 0xff; > } > > - private int eof(byte[] b, int off, int len, int oldOff) > - throws IOException > - { > + private int leftovers(byte[] b, int off, int pos, int limit) { > eof = true; > - if (nextin != 18) { > - if (nextin == 12) > - throw new IOException("Base64 stream has one un-decoded > dangling byte."); > - // treat ending xx/xxx without padding character legal. > - // same logic as v == '=' below > - b[off++] = (byte)(bits >> (16)); > - if (nextin == 0) { // only one padding byte > - if (len == 1) { // no enough output space > - bits >>= 8; // shift to lowest byte > - nextout = 0; > - } else { > - b[off++] = (byte) (bits >> 8); > - } > - } > + > + /* > + * We use a loop here, as this method is executed only a few > times. > + * Unrolling the loop would probably not contribute much here. > + */ > + while (rpos - 8 >= wpos && pos != limit) { > + rpos -= 8; > + b[pos++] = (byte) (bits >> rpos); > } > - return off == oldOff ? -1 : off - oldOff; > + return pos - off != 0 || rpos - 8 >= wpos ? pos - off : -1; > } > > - private int padding(byte[] b, int off, int len, int oldOff) > - throws IOException > - { > - // = shiftto==18 unnecessary padding > - // x= shiftto==12 dangling x, invalid unit > - // xx= shiftto==6 && missing last '=' > - // xx=y or last is not '=' > - if (nextin == 18 || nextin == 12 || > - nextin == 6 && is.read() != '=') { > - throw new IOException("Illegal base64 ending sequence:" + > nextin); > + private int eof(byte[] b, int off, int pos, int limit) throws > IOException { > + /* > + * pos != limit > + * > + * wpos == 18: x dangling single x, invalid unit > + * accept ending xx or xxx without padding characters > + */ > + if (wpos == 18) { > + throw new IOException("Base64 stream has one un-decoded > dangling byte."); > } > - b[off++] = (byte)(bits >> (16)); > - if (nextin == 0) { // only one padding byte > - if (len == 1) { // no enough output space > - bits >>= 8; // shift to lowest byte > - nextout = 0; > - } else { > - b[off++] = (byte) (bits >> 8); > - } > + rpos = 24; > + return leftovers(b, off, pos, limit); > + } > + > + private int padding(byte[] b, int off, int pos, int limit) throws > IOException { > + /* > + * pos != limit > + * > + * wpos == 24: = (unnecessary padding) > + * wpos == 18: x= (dangling single x, invalid unit) > + * wpos == 12 and missing last '=': xx= (invalid padding) > + * wpos == 12 and last is not '=': xx=x (invalid padding) > + */ > + if (wpos >= 18 || wpos == 12 && is.read() != '=') { > + throw new IOException("Illegal base64 ending sequence:" + > wpos); > } > - eof = true; > - return off - oldOff; > + rpos = 24; > + return leftovers(b, off, pos, limit); > } > > @Override > public int read(byte[] b, int off, int len) throws IOException { > - if (closed) > + if (closed) { > throw new IOException("Stream is closed"); > - if (eof && nextout < 0) // eof and no leftover > - return -1; > - if (off < 0 || len < 0 || len > b.length - off) > - throw new IndexOutOfBoundsException(); > - int oldOff = off; > - while (nextout >= 0) { // leftover output byte(s) in bits > buf > - if (len == 0) > - return off - oldOff; > - b[off++] = (byte)(bits >> nextout); > - len--; > - nextout -= 8; > + } > + Objects.checkFromIndexSize(off, len, b.length); > + if (len == 0) { > + return 0; > } > - bits = 0; > - while (len > 0) { > - int v = is.read(); > - if (v == -1) { > - return eof(b, off, len, oldOff); > + > + /* > + * Rather than keeping 2 running vars (e.g., off and len), > + * we only keep one (pos), while definitely fixing the boundaries > + * of the range [off, limit). > + * More specifically, each use of pos as an index in b meets > + * pos - off >= 0 & limit - pos > 0 > + * > + * Note that limit can overflow to Integer.MIN_VALUE. However, > + * as long as comparisons with pos are as coded, there's no harm. > + */ > + int pos = off; > + final int limit = off + len; > + if (eof) { > + return leftovers(b, off, pos, limit); > + } > + > + /* > + * Leftovers from previous invocation; here, wpos = 0. > + * There can be at most 2 leftover bytes (rpos <= 16). > + * Further, b has at least one free place. > + * > + * The logic could be coded as a loop, (as in method leftovers()) > + * but the explicit "unrolling" makes it possible to generate > + * better byte extraction code. > + */ > + if (rpos == 16) { > + b[pos++] = (byte) (bits >> 8); > + rpos = 8; > + if (pos == limit) { > + return len; > } > - if ((v = base64[v]) < 0) { > - if (v == -2) { // padding byte(s) > - return padding(b, off, len, oldOff); > - } > - if (v == -1) { > - if (!isMIME) > - throw new IOException("Illegal base64 character > " + > - Integer.toString(v, 16)); > - continue; // skip if for rfc2045 > - } > - // neve be here > - } > - bits |= (v << nextin); > - if (nextin == 0) { > - nextin = 18; // clear for next in > - b[off++] = (byte)(bits >> 16); > - if (len == 1) { > - nextout = 8; // 2 bytes left in bits > - break; > - } > - b[off++] = (byte)(bits >> 8); > - if (len == 2) { > - nextout = 0; // 1 byte left in bits > - break; > - } > - b[off++] = (byte)bits; > - len -= 3; > - bits = 0; > - } else { > - nextin -= 6; > + } > + if (rpos == 8) { > + b[pos++] = (byte) bits; > + rpos = 0; > + if (pos == limit) { > + return len; > } > } > - return off - oldOff; > + > + bits = 0; > + wpos = 24; > + for (;;) { > + /* pos != limit & rpos == 0 */ > + final int i = is.read(); > + if (i < 0) { > + return eof(b, off, pos, limit); > + } > + final int v = base64[i]; > + if (v < 0) { > + /* > + * i not in alphabet, thus > + * v == -2: i is '=', the padding > + * v == -1: i is something else, typically CR or LF > + */ > + if (v == -1) { > + if (isMIME) { > + continue; > + } > + throw new IOException("Illegal base64 character 0x" + > + Integer.toHexString(i)); > + } > + return padding(b, off, pos, limit); > + } > + wpos -= 6; > + bits |= v << wpos; > + if (wpos != 0) { > + continue; > + } > + if (limit - pos >= 3) { > + /* frequently taken fast path, no need to track rpos */ > + b[pos++] = (byte) (bits >> 16); > + b[pos++] = (byte) (bits >> 8); > + b[pos++] = (byte) bits; > + bits = 0; > + wpos = 24; > + if (pos == limit) { > + return len; > + } > + continue; > + } > + > + /* b has either 1 or 2 free places */ > + b[pos++] = (byte) (bits >> 16); > + if (pos == limit) { > + rpos = 16; > + return len; > + } > + b[pos++] = (byte) (bits >> 8); > + /* pos == limit, no need for an if */ > + rpos = 8; > + return len; > + } > } > > @Override > diff --git a/test/jdk/java/util/Base64/TestBase64.java > b/test/jdk/java/util/Base64/TestBase64.java > --- a/test/jdk/java/util/Base64/TestBase64.java > +++ b/test/jdk/java/util/Base64/TestBase64.java > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 2012, 2017, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 2012, 2020, Oracle and/or its affiliates. All rights > reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -24,7 +24,7 @@ > /** > * @test > * @bug 4235519 8004212 8005394 8007298 8006295 8006315 8006530 8007379 > 8008925 > - * 8014217 8025003 8026330 8028397 8129544 8165243 8176379 > + * 8014217 8025003 8026330 8028397 8129544 8165243 8176379 8222187 > * @summary tests java.util.Base64 > * @library /test/lib > * @build jdk.test.lib.RandomFactory > @@ -144,6 +144,10 @@ > testDecoderKeepsAbstinence(Base64.getDecoder()); > testDecoderKeepsAbstinence(Base64.getUrlDecoder()); > testDecoderKeepsAbstinence(Base64.getMimeDecoder()); > + > + // tests patch addressing JDK-8222187 > + // https://bugs.openjdk.java.net/browse/JDK-8222187 > + testJDK_8222187(); > } > > private static void test(Base64.Encoder enc, Base64.Decoder dec, > @@ -607,4 +611,26 @@ > } > } > } > + > + private static void testJDK_8222187() throws Throwable { > + byte[] orig = "12345678".getBytes("US-ASCII"); > + byte[] encoded = Base64.getEncoder().encode(orig); > + // decode using different buffer sizes, up to a longer one than > needed > + for (int bufferSize = 1; bufferSize <= encoded.length + 1; > bufferSize++) { > + try ( > + InputStream in = Base64.getDecoder().wrap( > + new ByteArrayInputStream(encoded)); > + ByteArrayOutputStream baos = new ByteArrayOutputStream(); > + ) { > + byte[] buffer = new byte[bufferSize]; > + int read; > + while ((read = in.read(buffer, 0, bufferSize)) >= 0) { > + baos.write(buffer, 0, read); > + } > + // compare result, output info if lengths do not match > + byte[] decoded = baos.toByteArray(); > + checkEqual(decoded, orig, "Base64 stream decoding failed!"); > + } > + } > + } > } > <JDK-8222187.patch> <http://oracle.com/us/design/oracle-email-sig-198324.gif> <http://oracle.com/us/design/oracle-email-sig-198324.gif> <http://oracle.com/us/design/oracle-email-sig-198324.gif> <http://oracle.com/us/design/oracle-email-sig-198324.gif>Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037 Oracle Java Engineering 1 Network Drive Burlington, MA 01803 lance.ander...@oracle.com <mailto:lance.ander...@oracle.com>