You've again caught me only half paying attention. Below is another attempt at micro-optimization (might be too tricky!), BUT: - consider documenting the UTFDataFormatException and the 64k length limit - error message would actually be more useful if it contained a few chars from head and tail instead of length
static int writeUTF(String str, DataOutput out) throws IOException { final int strlen = str.length(); int utflen = strlen; // optimized for ASCII for (int i = 0; i < strlen; i++) { int c = str.charAt(i); if (c >= 0x80 || c == 0) utflen += (c >= 0x800) ? 2 : 1; } if (utflen > 65535 || /* overflow */ utflen < strlen) throw new UTFDataFormatException(tooLongMsg(utflen)); final byte[] bytearr; if (out instanceof DataOutputStream) { DataOutputStream dos = (DataOutputStream)out; if (dos.bytearr == null || (dos.bytearr.length < (utflen + 2))) dos.bytearr = new byte[(utflen*2) + 2]; bytearr = dos.bytearr; } else { bytearr = new byte[utflen + 2]; } int count = 0; bytearr[count++] = (byte) ((utflen >>> 8) & 0xFF); bytearr[count++] = (byte) ((utflen >>> 0) & 0xFF); int i = 0; for (i = 0; i < strlen; i++) { // optimized for initial run of ASCII int c = str.charAt(i); if (c >= 0x80 || c == 0) break; bytearr[count++] = (byte) c; } for (; i < strlen; i++) { int c = str.charAt(i); if (c < 0x80 && c != 0) { bytearr[count++] = (byte) c; } else if (c >= 0x800) { bytearr[count++] = (byte) (0xE0 | ((c >> 12) & 0x0F)); bytearr[count++] = (byte) (0x80 | ((c >> 6) & 0x3F)); bytearr[count++] = (byte) (0x80 | ((c >> 0) & 0x3F)); } else { bytearr[count++] = (byte) (0xC0 | ((c >> 6) & 0x1F)); bytearr[count++] = (byte) (0x80 | ((c >> 0) & 0x3F)); } } out.write(bytearr, 0, utflen + 2); return utflen + 2; } private static String tooLongMsg(int bits32) { long actualLength = (bits32 > 0xFFFF) ? bits32 // handle int overflow : ((bits32 < 0) ? ((long)bits32 & 0xFFFFL) : bits32 + 0x10000L); return "encoded string too long: " + actualLength + " bytes"; } On Mon, Mar 18, 2019 at 11:36 AM Roger Riggs <roger.ri...@oracle.com> wrote: > Hi, > > Collapsing a couple of emails. > > On 03/18/2019 01:43 PM, Martin Buchholz wrote: > > Hmmmm .... > > I took another look at this and Roger tickled my UTF-8 optimization spidey > sense. Rewritten using Martin style: > > static int writeUTF(String str, DataOutput out) throws IOException { > int strlen = str.length(); > int utflen = strlen; // optimized for ASCII > > for (int i = 0; i < strlen; i++) { > int c = str.charAt(i); > if (c >= 0x80 || c == 0) > utflen += (c >= 0x800) ? 2 : 1; > } > > @Martin, you'd need uftlen to be a long to avoid the original overflow to > negative problem > for an input string length > MAXINT / 3. And that might be the cleanest. > Compared to adding > extra conditions in the loop. > > Otherwise, this code looks good, it might optimize to fewer branches than > the original. > > > if (utflen > 65535) > throw new UTFDataFormatException( > "encoded string too long: " + utflen + " bytes"); > > final byte[] bytearr; > if (out instanceof DataOutputStream) { > DataOutputStream dos = (DataOutputStream)out; > if (dos.bytearr == null || (dos.bytearr.length < (utflen + 2))) > dos.bytearr = new byte[(utflen*2) + 2]; > bytearr = dos.bytearr; > } else { > bytearr = new byte[utflen + 2]; > } > > int count = 0; > bytearr[count++] = (byte) ((utflen >>> 8) & 0xFF); > bytearr[count++] = (byte) ((utflen >>> 0) & 0xFF); > > int i = 0; > for (i = 0; i < strlen; i++) { // optimized for initial run of > ASCII > int c = str.charAt(i); > if (c >= 0x80 || c == 0) break; > bytearr[count++] = (byte) c; > } > > for (; i < strlen; i++) { > int c = str.charAt(i); > if (c < 0x80 && c != 0) { > bytearr[count++] = (byte) c; > } > else if (c >= 0x800) { > bytearr[count++] = (byte) (0xE0 | ((c >> 12) & 0x0F)); > bytearr[count++] = (byte) (0x80 | ((c >> 6) & 0x3F)); > bytearr[count++] = (byte) (0x80 | ((c >> 0) & 0x3F)); > } else { > bytearr[count++] = (byte) (0xC0 | ((c >> 6) & 0x1F)); > bytearr[count++] = (byte) (0x80 | ((c >> 0) & 0x3F)); > } > } > out.write(bytearr, 0, utflen + 2); > return utflen + 2; > } > > > On Mon, Mar 18, 2019 at 6:41 AM Roger Riggs <roger.ri...@oracle.com> > wrote: > >> Hi Brian, >> >> On 03/15/2019 05:06 PM, Brian Burkhalter wrote: >> >> Updated: http://cr.openjdk.java.net/~bpb/8219196/webrev.01/ >> >> >> I'm at odds with Martin on including utflen in the loop. (Or that's not >> what he meant). >> Now it has to do two comparisons for every input byte instead of 1. >> >> There is no need to have an early exit for input strings that are too >> long. >> It is a very unusual case, probably never except in testing. >> >> There might be a case for checking that strlen <= 65535, >> (even at 1 input byte per output byte it would be too large). >> But even that is probably not worth the compare and branch. >> > > The test only needs enough memory to create the input string (MAXINT/3+1) > and a bit more for the default sized ByteArrayOutputStream. > So it should run in 2G configurations. > > So yes, removing the enabled = false is ok. > > Roger > > > >> >> Instead of disabling the test statically, you could make it conditional >> on Runtime.maxMemory but the test will fail quickly anyway. >> >> >> For this I simply added the requirement to the jtreg tags instead. >> >> With the fix, it will throw immediately before any allocation, so there >> is >> no need for a restriction on memory size. >> The only time that would be necessary would be running the test on a >> runtime without the fix. >> >> The @requires >4g restriction is unnecessary. >> >> $.02, Roger >> >> >> >