Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

Peter Levart Wed, 27 May 2015 04:07:25 -0700


On 05/27/2015 11:36 AM, Ivan Gerasimov wrote:

Hi Sherman!
Please take a look at my other webrev, that I provided in the firstmessage.
WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/
I used StringJoiner there, which in some aspects seems to fit betterhere, comparing to StringBuilder.
It does reduce the code, but of course runs slower.
In that version every part of the source string had to be converted toa separate String, which add another redundant copying.
I still prefer the version, which deals with arrays. I don't thinkit's too complex.It surely adds some complexity to the original code, but it's only 70lines of code in total.
Everything else are comments and empty lines.

Sincerely yours,
Ivan


Hi Ivan,

If I was to choose, I would take the 1st variant (with index arrays)too. The speed does matter!

But anyway, every now and then one would like to write an algorithm thatgathers sub-strings and then either joins them or dumps them to somebuffer that eventually becomes another string or array. SinceString.substring() changed when offset/length fields were removed, thereis no official way to handle sub-sequences as "vews" of underlyingString. There was some debate in the past about using String'simplementation of CharSequence.subSequence() for that purpose - toreturn a private CharSequence implementation that would represent a viewof underlying String's array, but that was not pursued because oflimiting API specification that says:


"Returns a character sequence that is a subsequence of this sequence.

An invocation of this method of the form
       str.subSequence(begin, end)
behaves in exactly the same way as the invocation
       str.substring(begin, end)"

My opinion is that this part of text did matter at the time whensubstring() did return a view over underlying array, but when thesubstring method changed, there was no reason to keep the specificationbound to that text. It would have been wiser to keep the behaviorunchanged and change the specification. Anyway, what's done is done. Idon't know if String.subSequence can be changed now. Maybe?

What could be done is introduce some new API. It could be in the form ofa new class or just a static method (on CharSequence perhaps?) thatwould return a private implementation of CharSequence as a sub-sequenceview over underlying CharSequence or String's array in case thepassed-in argument is a String.

With such API one could gather sub-sequence view objects into aLinkedList and then dump them into the correctly pre-sizedStringBuilder. There could even be a String counstructor with thefollowing signature:


public String(Iterable<CharSequence> sequences, int estimatedLength) { ...

..which would eliminate the last copying step too. Hopefully alltemporary view objects would be found as method-local and would beallocated on stack / in registers by JIT (wishful thinking)...


Regards, Peter

On 27.05.2015 2:38, Xueming Shen wrote:
Ivan,
It might be worth trying String.index + StringBuilder, instead ofwriting/handling everything yourself.Yes, it inevitably adds an arraycopy at the end to convert theStrinbBuilder to String, but it might havebetter balance between performance and code complexity. The regex isprobably a little heavy for
literal string replacement, but StringBuilder should not be that bad ...

-Sherman

On 5/26/15 4:11 PM, Ivan Gerasimov wrote:
I updated the webrev:
http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/
In the check at 2300-2301 and 2351-2352 I replaced MAX_ARRAY_SIZEwith Integer.MAX_VALUE, which seems to be more accurate here.
And I want to add that this proposed implementation is not onlyfaster, but also more memory efficient.The following simple stress-test shows that the proposed version isable to handle twice larger strings, comparing to the currentimplementation.
----------------------------------------
public class C {
    public static void main(String[] args) throws Throwable {
        String s = "string";
        for (int i = 1; i < Integer.MAX_VALUE; ++i) {
            try {
                s = s.replace("string", "stringstring");
            } catch (OutOfMemoryError o) {
                System.out.println(i + ") " + s.length());
                break;
            }
        }
    }
}
----------------------------------------

$ time ~/java9/jdk/bin/java -showversion -Xmx1g C
java version "1.9.0-ea"
Java(TM) SE Runtime Environment (build 1.9.0-ea-b63)
Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b63, mixed mode)

25) 100663296

real    0m4.525s
user    0m4.402s
sys    0m1.189s

$ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
java version "1.9.0-internal"
Java(TM) SE Runtime Environment (build1.9.0-internal-igerasim_2015_05_23_19_25-b00)Java HotSpot(TM) 64-Bit Server VM (build1.9.0-internal-igerasim_2015_05_23_19_25-b00, mixed mode)
26) 201326592

real    0m2.139s
user    0m1.960s
sys    0m0.461s

Sincerely yours,
Ivan

On 24.05.2015 23:17, Ivan Gerasimov wrote:
Hello everybody!
I know many people here like it when the performance is gettingbetter.It was suggested to make the literal variant of String.replace()faster.Currently, this method is implemented as a few calls to regexp API,so that the whole implementation takes only two lines of code.
I've created two versions of the fix.
In the first one, we scan the string and store indices of the foundsubstrings in an array.
Then, we allocate the precisely sized char array and fill it it.
The case with the empty target has to be handled separately.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779
WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/00/webrev/
The second variant is much less verbose, however it's lessefficient too.
Here the StringJoiner is used as an intermediate storage.

WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/
Here are the micro-benchmark results (in a string of ~300 chars do~15 replacements).
0) Baseline
MyBenchmark.test  thrpt   40  257'051.948 ± 4537.484  ops/s

1) Heavy-duty +308%
MyBenchmark.test  thrpt   40  1'049'235.602 ± 15501.803 ops/s

2) StringJoiner +190%
MyBenchmark.test  thrpt   40  746'000.629 ± 15387.036  ops/s
Personally, I like my first variant better, even though it addsalmost 300 lines of code.
But I'd like to hear what people think of it.

Sincerely yours,
Ivan

Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

Reply via email to