Hi David,
One remote incompatibility note: the String returned from
StringBuffer.toString() is retained by StringBuffer until the next call
to StringBuffer mutating method. This can be observed for example if the
returned String object is wrapped by a WeakReference. This is really a
remotely incompatible difference in external behaviour, but it could be
fixed by the following variation of toString():
private transient char[] toStringCache;
@Override
public synchronized String toString() {
if (toStringCache == null) {
toStringCache = Arrays.copyOfRange(value, 0, count);
}
return new String(toStringCache, true);
}
Regards, Peter
On 05/10/2013 08:03 AM, David Holmes wrote:
Short version:
Cache the value returned by toString and use it to copy-construct a
new String on subsequent calls to toString(). Clear the cache on any
mutating operation.
webrev: http://cr.openjdk.java.net/~dholmes/8013395/webrev.v2/
Testing: microbenchmark for toString performance; new regression test
for correctness; JPRT testset core as a sanity check
Still TBD - full SE benchmark (?)
Thanks,
David
---------
Long version:
One of the goals for JDK8 is to provide a path from Java ME CDC to
Java SE (or SE Embedded). In the embedded space some pretty old
benchmarks still get used for doing comparisons between JRE's. One of
which makes heavy use of StringBuffer.toString, without modifying the
StringBuffer in between.
Up to Java 1.4.2 a StringBuffer and a String could share the
underlying char[]. This meant that toString simply needed to create a
new String that referenced the StringBuffer's char[] with no copying
of the array needed. In Java 5 the String/StringBuffer implementations
were completely revised: StringBuilder was introduced for
non-synchronized use, and a new AbstractStringBuilder base class added
for it and StringBuffer. In that implementation toString now has to
copy the StringBuffer's char[]. This resulted in a significant
performance regression for toString() and a bug - 6219959 - was
opened. There is quite an elaborate evaluation in that bug report but
bottom line was that "real code doesn't depend on this - won't fix".
At some stage ME also updated to the new Java 5 code and they also
noticed the problem. As a result CDC6 included a variation of the
caching strategy that is proposed here.
Going forward because we want people to be able to compare ME and SE
with their familiar benchmarks, we would like to address this corner
case and fix it using the caching strategy outlined. As a data point
an 8K StringBuffer that takes ~1ms to be converted to a String
initially, can process subsequent toString() calls in a few
microseconds. So that performance issue is addressed.
However we've added a write to a field in all the mutating methods
which obviously adds some additional computational effort - though I
have no doubt it is lost in the noise for all but the smallest of
mutating methods. Even so this should be run against regular SE
benchmarks to ensure there are no performance regressions there - so
if anyone has a suggestion as to the best benchmark to run to exercise
StringBuffer (if it exists), please let me know.
Thanks for reading this far :)