All,
There are many cases in Tomcat where we change the letter-case of a
String value so it's easier to compare when case doesn't matter. In
particular, HTTP header names and many spec-defined values are supposed
to be case-insensitive and so all comparisons involving them must be
done without regard to letter-case.
The idiom in Tomcat source code for that is[1]:
collection.add(element.toLowerCase(Locale.ENGLISH));
Locale.ENGLISH is used because all of these values are supposed to be in
ASCII encoding and Locale.ENGLISH is as good as any equivalent Locale
that (nominally) uses (mostly) ASCII semantics.
It turns out that String.toLowerCase (and it's mirror,
String.toUpperCase) has a ton of code in it to manage the many
complexities of Locales in which we are not interested.
Implementing an ASCII-only version of toLowerCase appears to have a
speed improvement of roughly 2x for some simple cases. I have a sample
microbenchmark below and the output of jmh on Java 17.
Given the frequency of calls to toLowerCase (many ties per request), I
think it may be a worthwhile performance improvement to implement and
use our own version of toLowerCase and use it when only ASCII is expected.
It may even be possible to write a more complicated version of
toLowerCase than I have below that performs even faster (e.g. for String
values that end up not having any upper-case characters at all).
WDYT?
-chris
[1]
https://github.com/apache/tomcat/blob/feb77a15849389001ebcfdd623df86a42a62019e/java/org/apache/tomcat/util/http/parser/TokenList.java#L95
Benchmark Mode Cnt Score
Error Units
MyBenchmark.testStringToUpperCase thrpt 5 28130795.259 ±
1297495.570 ops/s
MyBenchmark.testStringToUpperCaseASCII thrpt 5 52221288.421 ±
5112349.492 ops/s
Source:
import java.util.concurrent.TimeUnit;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.Warmup;
@Warmup(iterations=5, time=5, timeUnit=TimeUnit.SECONDS)
@Measurement(iterations=5, time=5, timeUnit=TimeUnit.SECONDS)
@BenchmarkMode(Mode.Throughput)
@Fork(1)
public class MyBenchmark {
private static final String SOURCE = "X-Frame-Options";
@Benchmark
public String testStringToUpperCase() {
return SOURCE.toUpperCase();
}
@Benchmark
public String testStringToUpperCaseASCII() {
return toUpperCaseASCII(SOURCE);
}
public String toUpperCaseASCII(String s) {
int len = s.length();
char[] result = new char[len];
for(int i=0; i<len; i++) {
char c = s.charAt(i);
if(c >= 'a' && c <= 'z') {
c -= 32;
}
result[i] = c;
}
return new String(result);
}
}
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org