On Mon, 23 Mar 2026 15:39:23 GMT, Roman Kennke <[email protected]> wrote:

> I analyzed the performance of Thread.setName() in response to a customer 
> workload running Cassandra, where Thread.setName() showed up (mostly because 
> of rather pathetic use of setName() from Cassanda *sigh*).
> 
> Profiling showed that most time (around 75%) is spent in the actual syscall, 
> so there are limits on what we can do. There is some fixed overheads like the 
> cost for synchronized, and also some costs that scale with the length of the 
> name, most importantly the UTF8 conversion.
> 
> I implemented the following improvements:
>  - Removed synchronized from setName(), as suggested by some folks in the JBS 
> issue. This saves ~15 nanoseconds. Not sure if the method could be called 
> contended by Cassandra, if so, the savings might be much larger.
>  - Almost all thread names are Latin1/ASCII, and there is no need to convert 
> to UTF8 in that case. Also, the various OS APIs to set the thread name don't 
> even seem to specify the character encoding. Avoiding the UTF8 conversion 
> brings down the length-dependent costs. In many cases we can also pass down 
> the backing array of the string and avoid copying.
>  - When the name doesn't change, we can skip updating the native name, which 
> makes setName() almost a no-op.
>  - For truncating the name on Linux to 16 chars, instead of using snprintf 
> with a pattern, we can simply stitch together the name directly (first 7 
> chars, last 6 chars, 2 dots in between), this saves ~100ns.
> 
> In the end, we bring down performance for the small cases by ~7%, longer 
> names by ~20% and completely removed the conversion overhead that primarily 
> affected longer names.
> 
>   | Benchmark     | (length) | Baseline (ns/op) | Optimized (ns/op) | Change  
> |
>   
> |---------------|----------|------------------:|-------------------:|--------:|
>   | setName       |        1 |    602.3 ±  2.0   |     561.9 ±  1.5   |  
> -6.7%  |
>   | setName       |        4 |    605.9 ±  2.1   |     570.2 ±  1.2   |  
> -5.9%  |
>   | setName       |       15 |    617.1 ±  2.7   |     570.4 ±  2.8   |  
> -7.6%  |
>   | setName       |       16 |    712.1 ±  6.0   |     569.4 ±  2.7   | 
> -20.0%  |
>   | setName       |       50 |    757.9 ±  5.2   |     566.3 ±  4.6   | 
> -25.3%  |
>   | setName       |      200 |    986.2 ±  2.7   |     569.9 ±  4.9   | 
> -42.2%  |
>   | setNameSame   |        1 |             —     |       7.4 ±  0.0   |    —  
>   |
>   | setNameSame   |        4 |             —     |       7.4 ±  0.0   |    —  
>   |
>   | setNameSame   |       15 |             —     |       7.4 ±  0.0   |    —  
>   |
>   | setN...

I'd like to know more about the way this API was being (mis)used to get a 
better sense of whether this is just pandering to an extreme case. I'm not a 
fan of all this "micro-optimisation" and it seems to me this should be slowing 
down normal usecases. I would like to see performance numbers for all platforms 
(Linux, macOS, Windows).

The synchronization is needed.

src/hotspot/os/bsd/os_bsd.cpp line 2261:

> 2259:     // Add a "Java: " prefix to the name
> 2260:     char buf[MAXTHREADNAMESIZE];
> 2261:     (void) os::snprintf(buf, sizeof(buf), "Java: %.*s", (int)len, name);

Surprised this would make any difference.

src/hotspot/os/linux/os_linux.cpp line 4887:

> 4885:   // (e.g. "Dispatc..read21").
> 4886:   if (len >= sizeof(buf)) {
> 4887:     // truncate: first 7 bytes, "..", 6 bytes from the end = 7+2+6 = 
> 15, then NUL terminator

Better add a comment that this is more performant than snprintf - else someone 
will "fix" it.

src/hotspot/os/windows/os_windows.cpp line 1068:

> 1066: void os::set_native_thread_name(const char *name, size_t len) {
> 1067:   // Windows APIs require NUL-terminated strings; the name pointer
> 1068:   // may not be NUL-terminated, so copy into a local buffer.

Surely this makes things slower on Windows!

src/hotspot/os/windows/os_windows.cpp line 1069:

> 1067:   // Windows APIs require NUL-terminated strings; the name pointer
> 1068:   // may not be NUL-terminated, so copy into a local buffer.
> 1069:   char stack_buf[256];

Where does 256 limit come from?

src/java.base/share/classes/java/lang/Thread.java line 1793:

> 1791:      * @see        #getName
> 1792:      */
> 1793:     public final void setName(String name) {

You can't get rid of synchronization here as the method needs to be atomic wrt. 
setting the Java level name and the native name.

src/java.base/share/classes/java/lang/Thread.java line 1799:

> 1797:         String oldName = this.name;
> 1798:         this.name = name;
> 1799:         if (!isVirtual() && Thread.currentThread() == this && 
> !name.equals(oldName)) {

Surely this slows down normal usage!

-------------

Changes requested by dholmes (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/30374#pullrequestreview-3996171742
PR Review Comment: https://git.openjdk.org/jdk/pull/30374#discussion_r2978928095
PR Review Comment: https://git.openjdk.org/jdk/pull/30374#discussion_r2978930731
PR Review Comment: https://git.openjdk.org/jdk/pull/30374#discussion_r2978935900
PR Review Comment: https://git.openjdk.org/jdk/pull/30374#discussion_r2978937388
PR Review Comment: https://git.openjdk.org/jdk/pull/30374#discussion_r2979133036
PR Review Comment: https://git.openjdk.org/jdk/pull/30374#discussion_r2979138335

Reply via email to