Hi,

Collapsing a couple of emails.

On 03/18/2019 01:43 PM, Martin Buchholz wrote:
Hmmmm ....

I took another look at this and Roger tickled my UTF-8 optimization spidey sense.  Rewritten using Martin style:

    static int writeUTF(String str, DataOutput out) throws IOException {
        int strlen = str.length();
        int utflen = strlen;    // optimized for ASCII

        for (int i = 0; i < strlen; i++) {
            int c = str.charAt(i);
            if (c >= 0x80 || c == 0)
                utflen += (c >= 0x800) ? 2 : 1;
        }
@Martin, you'd need uftlen to be a long to avoid the original overflow to negative problem for an input string length > MAXINT / 3.  And that might be the cleanest. Compared to adding
extra conditions in the loop.

Otherwise, this code looks good, it might optimize to fewer branches than the original.

        if (utflen > 65535)
            throw new UTFDataFormatException(
                "encoded string too long: " + utflen + " bytes");

        final byte[] bytearr;
        if (out instanceof DataOutputStream) {
            DataOutputStream dos = (DataOutputStream)out;
            if (dos.bytearr == null || (dos.bytearr.length < (utflen + 2)))
                dos.bytearr = new byte[(utflen*2) + 2];
            bytearr = dos.bytearr;
        } else {
            bytearr = new byte[utflen + 2];
        }

        int count = 0;
        bytearr[count++] = (byte) ((utflen >>> 8) & 0xFF);
        bytearr[count++] = (byte) ((utflen >>> 0) & 0xFF);

        int i = 0;
        for (i = 0; i < strlen; i++) { // optimized for initial run of ASCII
            int c = str.charAt(i);
            if (c >= 0x80 || c == 0) break;
            bytearr[count++] = (byte) c;
        }

        for (; i < strlen; i++) {
            int c = str.charAt(i);
            if (c < 0x80 && c != 0) {
                bytearr[count++] = (byte) c;
            }
            else if (c >= 0x800) {
                bytearr[count++] = (byte) (0xE0 | ((c >> 12) & 0x0F));
                bytearr[count++] = (byte) (0x80 | ((c >>  6) & 0x3F));
                bytearr[count++] = (byte) (0x80 | ((c >>  0) & 0x3F));
            } else {
                bytearr[count++] = (byte) (0xC0 | ((c >>  6) & 0x1F));
                bytearr[count++] = (byte) (0x80 | ((c >>  0) & 0x3F));
            }
        }
        out.write(bytearr, 0, utflen + 2);
        return utflen + 2;
    }


On Mon, Mar 18, 2019 at 6:41 AM Roger Riggs <roger.ri...@oracle.com <mailto:roger.ri...@oracle.com>> wrote:

    Hi Brian,

    On 03/15/2019 05:06 PM, Brian Burkhalter wrote:
    Updated: http://cr.openjdk.java.net/~bpb/8219196/webrev.01/
    <http://cr.openjdk.java.net/%7Ebpb/8219196/webrev.01/>

    I'm at odds with Martin on including utflen in the loop. (Or
    that's not what he meant).
    Now it has to do two comparisons for every input byte instead of 1.

    There is no need to have an early exit for input strings that are
    too long.
    It is a very unusual case, probably never except in testing.

    There might be a case for checking that  strlen <= 65535,
    (even at 1 input byte per output byte it would be too large).
    But even that is probably not worth the compare and branch.


The test only needs enough memory to create the input string (MAXINT/3+1)
and a bit more for the default sized ByteArrayOutputStream.
So it should run in 2G configurations.

So yes, removing the enabled = false is ok.

Roger




    Instead of disabling the test statically, you could make it
    conditional
    on Runtime.maxMemory but the test will fail quickly anyway.

    For this I simply added the requirement to the jtreg tags instead.
    With the fix, it will throw immediately before any allocation, so
    there is
    no need for a restriction on memory size.
    The only time that would be necessary would be running the test on
    a runtime without the fix.

    The @requires >4g restriction is unnecessary.

    $.02, Roger



Reply via email to