On Mon, 29 Sep 2025 15:08:14 GMT, Kieran Farrell <[email protected]> wrote:

>> With the recent approval of UUIDv7 
>> (https://datatracker.ietf.org/doc/rfc9562/), this PR aims to add a new 
>> static method UUID.timestampUUID() which constructs and returns a UUID in 
>> support of the new time generated UUID version. 
>> 
>> The specification requires embedding the current timestamp in milliseconds 
>> into the first bits 0–47. The version number in bits 48–51, bits 52–63 are 
>> available for sub-millisecond precision or for pseudorandom data. The 
>> variant is set in bits 64–65. The remaining bits 66–127 are free to use for 
>> more pseudorandom data or to employ a counter based approach for increased 
>> time percision 
>> (https://www.rfc-editor.org/rfc/rfc9562.html#name-uuid-version-7).
>> 
>> The choice of implementation comes down to balancing the sensitivity level 
>> of being able to distingush UUIDs created below <1ms apart with performance. 
>> A test simulating a high-concurrency environment with 4 threads generating 
>> 10000 UUIDv7 values in parallel to measure the collision rate of each 
>> implementation (the amount of times the time based portion of the UUID was 
>> not unique and entries could not distinguished by time) yeilded the 
>> following results for each implemtation:
>> 
>> 
>> - random-byte-only - 99.8%
>> - higher-precision - 3.5%
>> - counter-based - 0%
>> 
>> 
>> Performance tests show a decrease in performance as expected with the 
>> counter based implementation due to the introduction of synchronization:
>> 
>> - random-byte-only   143.487 ± 10.932  ns/op
>> - higher-precision      149.651 ±  8.438 ns/op
>> - counter-based         245.036 ±  2.943  ns/op
>> 
>> The best balance here might be to employ a higher-precision implementation 
>> as the large increase in time sensitivity comes at a very slight performance 
>> cost.
>
> Kieran Farrell has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   missing semicolon

Adding support for UUID v7 also includes **sorting correctly**, IMO.

This has always been incorrect in the JDK as I see it, but back in the days of 
UUIDv1 to v4 nobody really cared that much how a UUID would sort. Enter UUID v7 
and sorting is now important to get right.

So what is the problem?  The existing `UUIID.compareTo()` method compares the 
two longs (nothing wrong with that), but those longs are SIGNED and what you 
need would be UNSIGNED comparison.

The problem was recognized years ago in 
[JDK-7025832](https://bugs.openjdk.org/browse/JDK-7025832) but was rejected to 
change it due to concerns over backward compatibility.

The problem - when UUID v7 is introduced - is that it becomes apparent that the 
JDK does not sort the UUID in the same way as the database does or indeed any 
other language. Previously, this was less of a concern because there was less 
of reason to sort UUIDs.

To be specific, what you expect - and what both the old RFC-4122 spec and the 
newer RFC-9562 states in their own words - is that UUIDs should be 
lexicographically sorted, i.e. as if by comparing two arrays of bytes (len=16) 
where each byte is a value 0-255 (
as opposed to a value -128 to 127). An implementation could be:


public int compareToLexi(UUID val) {
    int mostSigBits = Long.compareUnsigned(this.mostSigBits, val.mostSigBits);
    return mostSigBits != 0 ? mostSigBits : 
Long.compareUnsigned(this.leastSigBits, val.leastSigBits);
}


This would be exactly equal to a method which compares byte arrays as described 
above.

I do not suggest to change the existing `compareTo()`  logic. But at the very 
least this legacy problem should be highlighted somewhere in the Javadoc. 
Addressing this, at least with a comment, would be part of a proper UUIDv7 
implementation.

My 2c.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25303#issuecomment-3352041251

Reply via email to