So we are talking about a saving of around 35 nanoseconds per call.
I just ran Tomcat 10.1.x through a profiler and requesting the Tomcat
homepage triggered String.toLowerCase() just under 100 times (some of
those calls may be from other JRE methods). So at best, we are looking
at 3 microseconds per request. That is pretty small but it all adds up
over time so +1 from me providing we have some sort of unit tests that
confirms the custom code is faster than the JRE code in case
circumstances change in the future.
Mark
On 13/09/2023 17:38, Christopher Schultz wrote:
All,
Ping. I've added a few other implementations which will e.g. perform no
char-copy if the string is already in lower-case, so they are faster
under special circumstances.
I'm happy to share my jmh runs, which seem to show that Java's
String.toLowerCase is getting faster and faster every time I run the
benchmark, which is puzzling.
Thanks,
-chris
On 9/8/23 13:39, Christopher Schultz wrote:
All,
Please ignore the fact that my benchmark is all oriented around
toUpperCase instead of toLowerCase :)
-chris
On 9/8/23 13:25, Christopher Schultz wrote:
All,
There are many cases in Tomcat where we change the letter-case of a
String value so it's easier to compare when case doesn't matter. In
particular, HTTP header names and many spec-defined values are
supposed to be case-insensitive and so all comparisons involving them
must be done without regard to letter-case.
The idiom in Tomcat source code for that is[1]:
collection.add(element.toLowerCase(Locale.ENGLISH));
Locale.ENGLISH is used because all of these values are supposed to be
in ASCII encoding and Locale.ENGLISH is as good as any equivalent
Locale that (nominally) uses (mostly) ASCII semantics.
It turns out that String.toLowerCase (and it's mirror,
String.toUpperCase) has a ton of code in it to manage the many
complexities of Locales in which we are not interested.
Implementing an ASCII-only version of toLowerCase appears to have a
speed improvement of roughly 2x for some simple cases. I have a
sample microbenchmark below and the output of jmh on Java 17.
Given the frequency of calls to toLowerCase (many ties per request),
I think it may be a worthwhile performance improvement to implement
and use our own version of toLowerCase and use it when only ASCII is
expected.
It may even be possible to write a more complicated version of
toLowerCase than I have below that performs even faster (e.g. for
String values that end up not having any upper-case characters at all).
WDYT?
-chris
[1]
https://github.com/apache/tomcat/blob/feb77a15849389001ebcfdd623df86a42a62019e/java/org/apache/tomcat/util/http/parser/TokenList.java#L95
Benchmark Mode Cnt Score
Error Units
MyBenchmark.testStringToUpperCase thrpt 5 28130795.259 ±
1297495.570 ops/s
MyBenchmark.testStringToUpperCaseASCII thrpt 5 52221288.421 ±
5112349.492 ops/s
Source:
import java.util.concurrent.TimeUnit;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.Warmup;
@Warmup(iterations=5, time=5, timeUnit=TimeUnit.SECONDS)
@Measurement(iterations=5, time=5, timeUnit=TimeUnit.SECONDS)
@BenchmarkMode(Mode.Throughput)
@Fork(1)
public class MyBenchmark {
private static final String SOURCE = "X-Frame-Options";
@Benchmark
public String testStringToUpperCase() {
return SOURCE.toUpperCase();
}
@Benchmark
public String testStringToUpperCaseASCII() {
return toUpperCaseASCII(SOURCE);
}
public String toUpperCaseASCII(String s) {
int len = s.length();
char[] result = new char[len];
for(int i=0; i<len; i++) {
char c = s.charAt(i);
if(c >= 'a' && c <= 'z') {
c -= 32;
}
result[i] = c;
}
return new String(result);
}
}
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org