Thanks for all the feedback and discussion. I've rewritten the patch to use Peter's suggestion of caching the char[] instead of the actual String:

http://cr.openjdk.java.net/~dholmes/8013395/webrev.v3/

I too am concerned that any form of caching that avoids the array-copy (which is the crux of the issue) is going to incur a 2x space penalty. I can imagine old but well-behaved code that uses StringBuffer rather than StringBuilder where very long buffers are created and toString() only called once - and they may get immediate or subsequent OOME due to the extra char[]; or excessive GC activity might be induced if a lot of StringBuffers are used.

So I tried to implement a simple copy-on-write-if-shared (COWIS) scheme:

http://cr.openjdk.java.net/~dholmes/8013395/webrev.v4/

but I must have done something silly as the resulting JDK won't run - for some reason it causes classes and resources to not be found. :(

David
-----

On 11/05/2013 3:36 AM, Mike Duigou wrote:
The implementation looks OK to me. I like Peter's suggestion of caching the 
char[] rather than string.

I do wish that the cache could be in a soft reference but understand that there 
are tradeoffs to doing so. I am worried about leaks with this implementation.

Another possibility, to go totally nuts, is to consider implementing a form of 
copy-on-write-following-toString. This would be the exact opposite of "minimal set 
of changes to address this specific problem" and probably not wise to attempt for 
Java 8.

Just as an FYI, this issue has in-flight conflicts with Martin's ongoing 
CharSequence.getChars patch.

Mike

On May 9 2013, at 23:03 , David Holmes wrote:

Short version:

Cache the value returned by toString and use it to copy-construct a new String 
on subsequent calls to toString(). Clear the cache on any mutating operation.

webrev: http://cr.openjdk.java.net/~dholmes/8013395/webrev.v2/

Testing: microbenchmark for toString performance; new regression test for 
correctness; JPRT testset core as a sanity check

Still TBD - full SE benchmark (?)

Thanks,
David
---------

Long version:

One of the goals for JDK8 is to provide a path from Java ME CDC to Java SE (or 
SE Embedded). In the embedded space some pretty old benchmarks still get used 
for doing comparisons between JRE's. One of which makes heavy use of 
StringBuffer.toString, without modifying the StringBuffer in between.

Up to Java 1.4.2 a StringBuffer and a String could share the underlying char[]. This 
meant that toString simply needed to create a new String that referenced the 
StringBuffer's char[] with no copying of the array needed. In Java 5 the 
String/StringBuffer implementations were completely revised: StringBuilder was introduced 
for non-synchronized use, and a new AbstractStringBuilder base class added for it and 
StringBuffer. In that implementation toString now has to copy the StringBuffer's char[]. 
This resulted in a significant performance regression for toString() and a bug - 6219959 
- was opened. There is quite an elaborate evaluation in that bug report but bottom line 
was that "real code doesn't depend on this - won't fix".

At some stage ME also updated to the new Java 5 code and they also noticed the 
problem. As a result CDC6 included a variation of the caching strategy that is 
proposed here.

Going forward because we want people to be able to compare ME and SE with their 
familiar benchmarks, we would like to address this corner case and fix it using 
the caching strategy outlined. As a data point an 8K StringBuffer that takes 
~1ms to be converted to a String initially, can process subsequent toString() 
calls in a few microseconds. So that performance issue is addressed.

However we've added a write to a field in all the mutating methods which 
obviously adds some additional computational effort - though I have no doubt it 
is lost in the noise for all but the smallest of mutating methods. Even so this 
should be run against regular SE benchmarks to ensure there are no performance 
regressions there - so if anyone has a suggestion as to the best benchmark to 
run to exercise StringBuffer (if it exists), please let me know.

Thanks for reading this far :)

Reply via email to