Hi Goktug, Looks like good stuff. And it remembers me from another old issue: patterns of code that demand data copying that could often be avoided. Every StringBuffer/StringBuilder is ultimately consumed by a toString() invocation to produce the result String, and in 99% of all uses, only one such String is produced and the buffer object is immediately discarded. So, toString() could pass the internal char[] to the String constructor, instead of forcing its copy. The API is coded conservatively to always do that copy, because in 1% of cases people reuse the buffer, and also to trim the char[]; for long-lived strings it would suck to retain extra chars in the heap. But these cases are the exception rather than the rule, and we can just invent a new method, e.g. toStringShared(), so the developer would only call that when the tradeoff is positive: when the product String is short-lived and the builder is not reused.
The same problem happens in other buffer-like APIs. In one project I had to re-implement ByteArrayOutputStream, copying the source code from the JDK but just adding a method that returns the internal array, because I was using it for moderately large buffers, doing this several dozens of times per second, and the cost of copying data was significant. It's worth notice that in some cases when I can predict exactly the size of the buffer, I can use that trick even if I don't want any extra bytes in the resulting array. In any such APIs that don't require defensive copying for security reasons, it should provide the "fast but dangerous" non-copying accessors. Even the javac-produced concatenation code (for String's '+' operator) could use the optimized toString() in some easy cases, e.g. when the produced string is only used as a parameter to println(), Logger.log() and other well-known APIs that are guaranteed to consume the string immediately and not retain references. OTOH, this RFE would be obsolete if HotSpot could detect and eliminate unnecessary buffer copies automatically. Coincidentally, there's a recent RFE committed in HotSpot that seems to do just that - http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6892658, C2 should optimize some stringbuilder patterns. I wonder if this does exactly what I want for StringBuilder, the bug description is very short (and the source changeset very long for those not familiar with HotSpot dev...). Ideally this should be generalized to any code that relies on Java's fundamental primitives for buffer copying, like Arrays.copyOf/copyOfRange(). The only problem with these "magic" optimizations though, is that they're often no much good if only available in C2. Java is hopefully starting a renaissance in the desktop, and any trick that requires -server is useless for applets or WebStart apps, so a few low-level APIs are still useful. Even with a better JIT, such APIs may still be useful because the developer can express his intent - the compiler can detect that my StringBuilder object is not reused after consumed by toString(), but it cannot detect that the produced String is short-lived (except in the easy case of non-Escaping object) and this information is also essential for the best possible code. So, a special variant of toString() could serve as a hint for the optimizer. A+ Osvaldo Em 17/11/2009 06:16, Goktug Gokdogan escreveu: > Hi. > > As you know, java.lang.StringXXXX classes favor insertion to the end in terms of performance. Adding to beginning or middle of these sequences causes most of the array to be shifted toward the end every time. Every ones in a while I end up changing my algorithms those are based on prefixing strings (ex. building a fully qualified name from class name to root direction). Sometimes these kind of changes could result in less readable code. And I should note that, there are lots of developers out there who does not know the internal details of these classes. > > Anyway, this behavior was frustrating me for a while and I decided to suggest a gap buffer based modification to AbstractStringBuilder to solve this problem but never had time before to prototype it. > > I tested the original StringBuilder against the prototype. Preliminary results for the duration of 100000 insertions of a short string: > > Original | Prototype > append => ~33 | ~34 > insert beginning => ~32000 | ~38 > insert random => ~16000 | ~10000 > > > A negligible overhead appears for appending (which could be avoided with shortcuts), but lots of performance gain achieved for other cases. > If we handle insertion at zero as a special case in insert method or if we add an another method like 'prepend', the insertion at beginning will show exactly same performance characteristics of insertion at the end. > > In my opinion, this is a well-worth modification to string building classes. If anybody agrees on sponsoring, I can tidy up code and contribute to OpenJDK. > > - Goktug > > > PS: I've used following code for rough testing: > > > private static final int LOOP_COUNT = 100000; > > public static void main(String[] args) { > long nanoTime = System.nanoTime(); > testStandardAppend(); > //testAppendBeginning(); > //testAppendRandom(); > long span = System.nanoTime() - nanoTime; > System.out.println(span / 1000000); > } > > private static void testStandardAppend() { > StringBuilder builder = new StringBuilder("initial"); > for (int i = 0; i < LOOP_COUNT; i++) { > builder.append("tryouts"); > } > } > > private static void testAppendBeginning() { > StringBuilder builder = new StringBuilder("initial"); > for (int i = 0; i < LOOP_COUNT; i++) { > builder.insert(0, "tryouts"); > } > } > > private static void testAppendRandom() { > Random random = new Random(); > StringBuilder builder = new StringBuilder("initial"); > for (int i = 0; i < LOOP_COUNT; i++) { > builder.insert(random.nextInt(builder.length()), "tryouts"); > } > }
