Re: Gap Buffer based AbstractStringBuilder implementation

Osvaldo Doederlein Fri, 20 Nov 2009 09:31:13 -0800

Hi Goktug,

Looks like good stuff. And it remembers me from another old issue: patterns
of code that demand data copying that could often be avoided. Every
StringBuffer/StringBuilder is ultimately consumed by a toString() invocation
to produce the result String, and in 99% of all uses, only one such String
is produced and the buffer object is immediately discarded. So, toString()
could pass the internal char[] to the String constructor, instead of forcing
its copy. The API is coded conservatively to always do that copy, because in
1% of cases people reuse the buffer, and also to trim the char[]; for
long-lived strings it would suck to retain extra chars in the heap. But
these cases are the exception rather than the rule, and we can just invent a
new method, e.g. toStringShared(), so the developer would only call that
when the tradeoff is positive: when the product String is short-lived and
the builder is not reused.


The same problem happens in other buffer-like APIs. In one project I had to
re-implement ByteArrayOutputStream, copying the source code from the JDK but
just adding a method that returns the internal array, because I was using it
for moderately large buffers, doing this several dozens of times per second,
and the cost of copying data was significant. It's worth notice that in some
cases when I can predict exactly the size of the buffer, I can use that
trick even if I don't want any extra bytes in the resulting array.

In any such APIs that don't require defensive copying for security reasons,
it should provide the "fast but dangerous" non-copying accessors. Even the
javac-produced concatenation code (for String's '+' operator) could use the
optimized toString() in some easy cases, e.g. when the produced string is
only used as a parameter to println(), Logger.log() and other well-known
APIs that are guaranteed to consume the string immediately and not retain
references.

OTOH, this RFE would be obsolete if HotSpot could detect and eliminate
unnecessary buffer copies automatically. Coincidentally, there's a recent
RFE committed in HotSpot that seems to do just that -
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6892658, C2 should
optimize some stringbuilder patterns. I wonder if this does exactly what I
want for StringBuilder, the bug description is very short (and the source
changeset very long for those not familiar with HotSpot dev...). Ideally
this should be generalized to any code that relies on Java's fundamental
primitives for buffer copying, like Arrays.copyOf/copyOfRange(). The only
problem with these "magic" optimizations though, is that they're often no
much good if only available in C2. Java is hopefully starting a renaissance
in the desktop, and any trick that requires -server is useless for applets
or WebStart apps, so a few low-level APIs are still useful. Even with a
better JIT, such APIs may still be useful because the developer can express
his intent - the compiler can detect that my StringBuilder object is not
reused after consumed by toString(), but it cannot detect that the produced
String is short-lived (except in the easy case of non-Escaping object) and
this information is also essential for the best possible code. So, a special
variant of toString() could serve as a hint for the optimizer.

A+
Osvaldo

Em 17/11/2009 06:16, Goktug Gokdogan escreveu:
> Hi.
>
> As you know, java.lang.StringXXXX classes favor insertion to the end in
terms of performance. Adding to beginning or middle of these sequences
causes most of the array to be shifted toward the end every time. Every ones
in a while I end up changing my algorithms those are based on prefixing
strings (ex. building a fully qualified name from class name to root
direction). Sometimes these kind of changes could result in less readable
code. And I should note that, there are lots of developers out there who
does not know the internal details of these classes.
>
> Anyway, this behavior was frustrating me for a while and I decided to
suggest a gap buffer based modification to AbstractStringBuilder to solve
this problem but never had time before to prototype it.
>
> I tested the original StringBuilder against the prototype. Preliminary
results for the duration of 100000 insertions of a short string:
>
>                       Original | Prototype
>            append =>     ~33   |   ~34
>  insert beginning =>   ~32000  |   ~38
>     insert random =>   ~16000  |  ~10000
>
>
> A negligible overhead appears for appending (which could be avoided with
shortcuts), but lots of performance gain achieved for other cases.
> If we handle insertion at zero as a special case in insert method or if we
add an another method like 'prepend', the insertion at beginning will show
exactly same performance characteristics of insertion at the end.
>
> In my opinion, this is a well-worth modification to string building
classes. If anybody agrees on sponsoring, I can tidy up code and contribute
to OpenJDK.
>
> - Goktug
>
>
> PS: I've used  following code for rough testing:
>
>
> private static final int LOOP_COUNT = 100000;
>
> public static void main(String[] args) {
> long nanoTime = System.nanoTime();
> testStandardAppend();
> //testAppendBeginning();
> //testAppendRandom();
> long span = System.nanoTime() - nanoTime;
> System.out.println(span / 1000000);
> }
>
> private static void testStandardAppend() {
> StringBuilder builder = new StringBuilder("initial");
> for (int i = 0; i < LOOP_COUNT; i++) {
> builder.append("tryouts");
> }
> }
>
> private static void testAppendBeginning() {
> StringBuilder builder = new StringBuilder("initial");
> for (int i = 0; i < LOOP_COUNT; i++) {
> builder.insert(0, "tryouts");
> }
> }
>
> private static void testAppendRandom() {
> Random random = new Random();
> StringBuilder builder = new StringBuilder("initial");
> for (int i = 0; i < LOOP_COUNT; i++) {
> builder.insert(random.nextInt(builder.length()), "tryouts");
> }
> }

Re: Gap Buffer based AbstractStringBuilder implementation

Reply via email to