On Wed, 24 May 2023 19:36:44 GMT, Aleksey Shipilev <[email protected]> wrote:
> UUID is the very important class that is used to track identities of objects
> in large scale systems. On some of our systems, `UUID.randomUUID` takes >1%
> of total CPU time, and is frequently a scalability bottleneck due to
> `SecureRandom` synchronization.
>
> The major issue with UUID code itself is that it reads from the single
> `SecureRandom` instance by 16 bytes. So the heavily contended `SecureRandom`
> is bashed with very small requests. This also has a chilling effect on other
> users of `SecureRandom`, when there is a heavy UUID generation traffic.
>
> We can improve this by doing the bulk reads from the backing SecureRandom and
> possibly striping the reads across many instances of it.
>
>
> Benchmark Mode Cnt Score Error Units
>
> ### AArch64 (m6g.4xlarge, Graviton, 16 cores)
>
> # Before
> UUIDRandomBench.single thrpt 15 3.545 ± 0.058 ops/us
> UUIDRandomBench.max thrpt 15 1.832 ± 0.059 ops/us ; negative scaling
>
> # After
> UUIDRandomBench.single thrpt 15 4.421 ± 0.047 ops/us
> UUIDRandomBench.max thrpt 15 6.658 ± 0.092 ops/us ; positive scaling,
> ~1.5x
>
> ### x86_64 (c6.8xlarge, Xeon, 18 cores)
>
> # Before
> UUIDRandomBench.single thrpt 15 2.710 ± 0.038 ops/us
> UUIDRandomBench.max thrpt 15 1.880 ± 0.029 ops/us ; negative scaling
>
> # After
> Benchmark Mode Cnt Score Error Units
> UUIDRandomBench.single thrpt 15 3.099 ± 0.022 ops/us
> UUIDRandomBench.max thrpt 15 3.555 ± 0.062 ops/us ; positive
> scaling, ~1.2x
>
>
> Note that there is still a scalability bottleneck in current default random
> (`NativePRNG`), because it synchronizes over a singleton instance for SHA1
> mixer, then the engine itself, etc. -- it is quite a whack-a-mole to figure
> out the synchronization story there. The scalability fix in current default
> `SecureRandom` would be much more intrusive and risky, since it would change
> a core crypto class with unknown bug fanout.
>
> Using the bulk reads even when the underlying PRNG is heavily synchronized is
> still a win. A more scalable PRNG would benefit from this as well. This PR
> adds a system property to select the PRNG implementation, and there we can
> clearly see the benefit with more scalable PRNG sources:
>
>
> Benchmark Mode Cnt Score Error Units
>
> ### x86_64 (c6.8xlarge, Xeon, 18 cores)
>
> # Before, hacked `new SecureRandom()` to
> `SecureRandom.getInstance("SHA1PRNG")`
> UUIDRandomBench.single thrpt 15 3.661 ± 0.008 ops/us
> UUIDRandomBench...
src/java.base/share/classes/java/util/UUID.java line 255:
> 253: // initializations, and thus false sharing between
> reader threads.
> 254: random.nextBytes(buf);
> 255: for (int c = 0; c < BUF_SIZE; c += UUID_CHUNK) {
I think this could be faster by using a ByteBuffer (or VarHandle) to process as
longs.
https://mail.openjdk.org/pipermail/core-libs-dev/2023-March/101249.html
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/14135#discussion_r1206118465