On Sun, 20 Dec 2020 20:45:55 GMT, Claes Redestad <redes...@openjdk.org> wrote:
>>> I have to say that introducing a ThreadLocal here seems like a step in the >>> wrong direction. With a ThreadLocal, if I read this correctly, a >>> MessageDigest will be cached with each thread that ever calls this API, and >>> it won't be garbage collected until the thread dies. Some threads live >>> indefinitely, e.g., the ones in the fork-join common pool. Is this what we >>> want? Is UUID creation so frequent that each thread needs its own, or could >>> thread safety be handled using a simple locking protocol? >> >> This is a good point. Significant effort has gone into recent releases to >> reduce the use of TLs in the (JDK-8245309, JDK-8245308, JDK-8245304, >> JDK-8245302) so adding new usages is a disappointing. So I think this PR >> does need good justifications, and probably a lot more exploration of >> options. > > As Stuart argues a TL approach is likely to look better in a microbenchmark, > but make things worse in other regards. > > I started exploring options to this in #1855 (not logged an RFE, yet), and I > think there is potential to get most of the gains seen here without the > introduction of a `ThreadLocal`. The current patch also speeds up MD5.digest > for small chunks (+19% for 16 bytes, but less relative impact on larger > chunks, down to being lost in the noise on 16Kb). Speed-up of > `UUID.nameUUIDFromBytes` is somewhat modest right now (+4%, -17% > allocations), but I think it can be improved further without complicating > things too much. The MD5 intrinsic added by JDK-8250902 adds some constraints > on the implementation that holds back some polish. This can be fixed, but > requires some coordination. A trick that could be used here instead of `ThreadLocal` is to store an instance of MD5 retrieved via MessageDigest.getInstance, but clone it before use. See [this commit](https://github.com/openjdk/jdk/pull/1855/commits/2f266316d62ca875c38e2f771412d02625414bf4). This maintains thread safety, avoids the allure of TLs, and gives a substantial speed-up on the `UUIDBench.fromType3Bytes` micro (comparing it against the other optimizations that were already in #1855): Benchmark (size) Mode Cnt Score Error Units UUIDBench.fromType3Bytes 20000 thrpt 12 1.523 ± 0.066 ops/us UUIDBench.fromType3Bytes:·gc.alloc.rate.norm 20000 thrpt 12 408.036 ± 0.003 B/op Benchmark (size) Mode Cnt Score Error Units UUIDBench.fromType3Bytes 20000 thrpt 12 2.186 ± 0.228 ops/us UUIDBench.fromType3Bytes:·gc.alloc.rate.norm 20000 thrpt 12 264.023 ± 0.006 B/op ------------- PR: https://git.openjdk.java.net/jdk/pull/1821