Re: 8219196: DataOutputStream.writeUTF may throw unexpected exceptions

Roger Riggs Mon, 18 Mar 2019 11:37:06 -0700

Hi,

Collapsing a couple of emails.


On 03/18/2019 01:43 PM, Martin Buchholz wrote:

Hmmmm ....

I took another look at this and Roger tickled my UTF-8 optimizationspidey sense. Rewritten using Martin style:


    static int writeUTF(String str, DataOutput out) throws IOException {
        int strlen = str.length();
        int utflen = strlen;    // optimized for ASCII

        for (int i = 0; i < strlen; i++) {
            int c = str.charAt(i);
            if (c >= 0x80 || c == 0)
                utflen += (c >= 0x800) ? 2 : 1;
        }

@Martin, you'd need uftlen to be a long to avoid the original overflowto negative problemfor an input string length > MAXINT / 3. And that might be thecleanest. Compared to adding

extra conditions in the loop.

Otherwise, this code looks good, it might optimize to fewer branchesthan the original.


        if (utflen > 65535)
            throw new UTFDataFormatException(
                "encoded string too long: " + utflen + " bytes");

        final byte[] bytearr;
        if (out instanceof DataOutputStream) {
            DataOutputStream dos = (DataOutputStream)out;

if (dos.bytearr == null || (dos.bytearr.length < (utflen +2)))

                dos.bytearr = new byte[(utflen*2) + 2];
            bytearr = dos.bytearr;
        } else {
            bytearr = new byte[utflen + 2];
        }

        int count = 0;
        bytearr[count++] = (byte) ((utflen >>> 8) & 0xFF);
        bytearr[count++] = (byte) ((utflen >>> 0) & 0xFF);

        int i = 0;

for (i = 0; i < strlen; i++) { // optimized for initial run ofASCII

            int c = str.charAt(i);
            if (c >= 0x80 || c == 0) break;
            bytearr[count++] = (byte) c;
        }

        for (; i < strlen; i++) {
            int c = str.charAt(i);
            if (c < 0x80 && c != 0) {
                bytearr[count++] = (byte) c;
            }
            else if (c >= 0x800) {
                bytearr[count++] = (byte) (0xE0 | ((c >> 12) & 0x0F));
                bytearr[count++] = (byte) (0x80 | ((c >>  6) & 0x3F));
                bytearr[count++] = (byte) (0x80 | ((c >>  0) & 0x3F));
            } else {
                bytearr[count++] = (byte) (0xC0 | ((c >>  6) & 0x1F));
                bytearr[count++] = (byte) (0x80 | ((c >>  0) & 0x3F));
            }
        }
        out.write(bytearr, 0, utflen + 2);
        return utflen + 2;
    }

On Mon, Mar 18, 2019 at 6:41 AM Roger Riggs <roger.ri...@oracle.com<mailto:roger.ri...@oracle.com>> wrote:


    Hi Brian,

    On 03/15/2019 05:06 PM, Brian Burkhalter wrote:

    Updated: http://cr.openjdk.java.net/~bpb/8219196/webrev.01/
    <http://cr.openjdk.java.net/%7Ebpb/8219196/webrev.01/>


    I'm at odds with Martin on including utflen in the loop. (Or
    that's not what he meant).
    Now it has to do two comparisons for every input byte instead of 1.

    There is no need to have an early exit for input strings that are
    too long.
    It is a very unusual case, probably never except in testing.

    There might be a case for checking that  strlen <= 65535,
    (even at 1 input byte per output byte it would be too large).
    But even that is probably not worth the compare and branch.


The test only needs enough memory to create the input string (MAXINT/3+1)
and a bit more for the default sized ByteArrayOutputStream.
So it should run in 2G configurations.

So yes, removing the enabled = false is ok.

Roger

    Instead of disabling the test statically, you could make it
    conditional
    on Runtime.maxMemory but the test will fail quickly anyway.


    For this I simply added the requirement to the jtreg tags instead.

    With the fix, it will throw immediately before any allocation, so
    there is
    no need for a restriction on memory size.
    The only time that would be necessary would be running the test on
    a runtime without the fix.

    The @requires >4g restriction is unnecessary.

    $.02, Roger

Re: 8219196: DataOutputStream.writeUTF may throw unexpected exceptions

Reply via email to