On 13/05/2013 6:07 PM, Remi Forax wrote:
Hi David,

On 05/13/2013 09:12 AM, David Holmes wrote:
Thanks for all the feedback and discussion. I've rewritten the patch
to use Peter's suggestion of caching the char[] instead of the actual
String:

http://cr.openjdk.java.net/~dholmes/8013395/webrev.v3/

I too am concerned that any form of caching that avoids the array-copy
(which is the crux of the issue) is going to incur a 2x space penalty.
I can imagine old but well-behaved code that uses StringBuffer rather
than StringBuilder where very long buffers are created and toString()
only called once - and they may get immediate or subsequent OOME due
to the extra char[]; or excessive GC activity might be induced if a
lot of StringBuffers are used.

So I tried to implement a simple copy-on-write-if-shared (COWIS) scheme:

http://cr.openjdk.java.net/~dholmes/8013395/webrev.v4/

but I must have done something silly as the resulting JDK won't run -
for some reason it causes classes and resources to not be found. :(

I don't know if it's the only error but take a look to the constructor
String(char[], boolean), you will understand what's wrong.

No, that's the intended constructor - we want to share the char[] bewtween the StringBuffer and String until the StringBuffer is modified.

The other possible source of errors is that the VM has some string
optimizations that
recognize patterns like buffer.append(s).append(s2).toString().

Hmmmm, if there are intriniscs for StringBuffer then there may be a problem with getting the sharing right. But I still can't see anything to cause the bizarre behaviour I'm seeing.

I do note that my implementation is a little naive/simplistic as in some cases we will potentially copy the array twice eg once for COWIS then again to expand it for an append. :(

Thanks,
David


David
-----

Rémi


On 11/05/2013 3:36 AM, Mike Duigou wrote:
The implementation looks OK to me. I like Peter's suggestion of
caching the char[] rather than string.

I do wish that the cache could be in a soft reference but understand
that there are tradeoffs to doing so. I am worried about leaks with
this implementation.

Another possibility, to go totally nuts, is to consider implementing
a form of copy-on-write-following-toString. This would be the exact
opposite of "minimal set of changes to address this specific problem"
and probably not wise to attempt for Java 8.

Just as an FYI, this issue has in-flight conflicts with Martin's
ongoing CharSequence.getChars patch.

Mike

On May 9 2013, at 23:03 , David Holmes wrote:

Short version:

Cache the value returned by toString and use it to copy-construct a
new String on subsequent calls to toString(). Clear the cache on any
mutating operation.

webrev: http://cr.openjdk.java.net/~dholmes/8013395/webrev.v2/

Testing: microbenchmark for toString performance; new regression
test for correctness; JPRT testset core as a sanity check

Still TBD - full SE benchmark (?)

Thanks,
David
---------

Long version:

One of the goals for JDK8 is to provide a path from Java ME CDC to
Java SE (or SE Embedded). In the embedded space some pretty old
benchmarks still get used for doing comparisons between JRE's. One
of which makes heavy use of StringBuffer.toString, without modifying
the StringBuffer in between.

Up to Java 1.4.2 a StringBuffer and a String could share the
underlying char[]. This meant that toString simply needed to create
a new String that referenced the StringBuffer's char[] with no
copying of the array needed. In Java 5 the String/StringBuffer
implementations were completely revised: StringBuilder was
introduced for non-synchronized use, and a new AbstractStringBuilder
base class added for it and StringBuffer. In that implementation
toString now has to copy the StringBuffer's char[]. This resulted in
a significant performance regression for toString() and a bug -
6219959 - was opened. There is quite an elaborate evaluation in that
bug report but bottom line was that "real code doesn't depend on
this - won't fix".

At some stage ME also updated to the new Java 5 code and they also
noticed the problem. As a result CDC6 included a variation of the
caching strategy that is proposed here.

Going forward because we want people to be able to compare ME and SE
with their familiar benchmarks, we would like to address this corner
case and fix it using the caching strategy outlined. As a data point
an 8K StringBuffer that takes ~1ms to be converted to a String
initially, can process subsequent toString() calls in a few
microseconds. So that performance issue is addressed.

However we've added a write to a field in all the mutating methods
which obviously adds some additional computational effort - though I
have no doubt it is lost in the noise for all but the smallest of
mutating methods. Even so this should be run against regular SE
benchmarks to ensure there are no performance regressions there - so
if anyone has a suggestion as to the best benchmark to run to
exercise StringBuffer (if it exists), please let me know.

Thanks for reading this far :)


Reply via email to