Roger, For a String, the byte[] val is immutable, right? And even the current behavior of checking for negatives and then cloning would not be safe in the face of a concurrent modification, right? Is there something else going on here which I am missing?
Thanks, Brett On Mon, Jul 28, 2025 at 3:19 PM Roger Riggs <roger.ri...@oracle.com> wrote: > Hi Brett, > > Extra care is needed if the input array might be modified concurrently > with the method execution. > When control flow decisions are made based on array contents, the > integrity of the result depends on reading each byte of the array exactly > once. > > Regards, Roger > > > > On 7/27/25 4:45 PM, Brett Okken wrote: > > In String.encodeUTF8, when the coder is latin1, there is a call to > StringCoding.hasNegatives to determine if any special handling is needed. > If not, a clone of the val is returned. > If there are negative values, it then loops, from the beginning, through > all the values to handle any individual negative values. > > Would it be better to call StringCoding.countPositives? If the result > equals the length, the clone can still be returned. But if it does not, all > the values which are positive can be simply copied to the target byte[] and > only values beyond that point need to be checked again. > > > https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L1287-L1300 > > if (!StringCoding.hasNegatives(val, 0, val.length)) { > return val.clone(); > } > > int dp = 0; > byte[] dst = StringUTF16.newBytesFor(val.length); > for (byte c : val) { > if (c < 0) { > dst[dp++] = (byte) (0xc0 | ((c & 0xff) >> 6)); > dst[dp++] = (byte) (0x80 | (c & 0x3f)); > } else { > dst[dp++] = c; > } > } > > > Can be changed to look like: > > int positives = StringCoding.countPositives(val, 0, val.length); > if (positives == val.length) { > return val.clone(); > } > > int dp = positives; > byte[] dst = StringUTF16.newBytesFor(val.length); > if (positives > 0) { > System.arraycopy(val, 0, dst, 0, positives); > } > for (int i=dp; i<val.length; ++i) { > byte c = val[i]; > if (c < 0) { > dst[dp++] = (byte) (0xc0 | ((c & 0xff) >> 6)); > dst[dp++] = (byte) (0x80 | (c & 0x3f)); > } else { > dst[dp++] = c; > } > } > > > > I have done a bit of testing with the StringEncode jmh benchmark on my > local windows device. > > encodeLatin1LongEnd speeds up significantly (~70%) > encodeLatin1LongStart slows down (~20%) > encodeLatin1Mixed speeds up by ~30% > > The remaining tests do not show much difference either way. > > Brett > > > > > >