[ https://issues.apache.org/jira/browse/CASSANDRA-15064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16802209#comment-16802209 ]
Jon Meredith commented on CASSANDRA-15064: ------------------------------------------ I can confirm the behavior you're seeing, however I don't think it's something that should be changed. Cassandra uses the ClockSeq in the manner described in section 4.1.5 as a mechanism to avoid generating duplicate UUIDs if the server is restarted witht the clock in the past, or comes up with a different node id [1]. As an aside, if the time reported by the OS via System.currentTimeMillis goes backwards while the server is running, the *timestamp* (not clock seq) will be constrained to be 100ns later than the previous timestamp returned until time catches up (or Cassandra is restarted). The node id and clock sequence are initialized on server startup and should not change during execution, so I was wondering if you generated them synthetically? Here are the UUIDs you supplied broken into components by java.util.UUID {noformat} 1 UUID 84e2c963-4ef9-11e9-b57e-f0def1d0755e = variant[0b10] version[0b1] timestamp[0x01e94ef984e2c963] node[0xf0def1d0755e] clockseq[0x0000357e] 2 UUID 84e2c963-4ef9-11e9-b57f-f0def1d0755e = variant[0b10] version[0b1] timestamp[0x01e94ef984e2c963] node[0xf0def1d0755e] clockseq[0x0000357f] 3 UUID 84e2c963-4ef9-11e9-b580-f0def1d0755e = variant[0b10] version[0b1] timestamp[0x01e94ef984e2c963] node[0xf0def1d0755e] clockseq[0x00003580] 4 UUID 84e2c963-4ef9-11e9-b581-f0def1d0755e = variant[0b10] version[0b1] timestamp[0x01e94ef984e2c963] node[0xf0def1d0755e] clockseq[0x00003581] 5 UUID 84e2c963-4ef9-11e9-b582-f0def1d0755e = variant[0b10] version[0b1] timestamp[0x01e94ef984e2c963] node[0xf0def1d0755e] clockseq[0x00003582] {noformat} The reason the UUID [3] and TimeUUID [4] types compare differently for the clock sequence (and node) is that they have slightly different comparison functions. The TimeUUID treats clk_seq_hi as a signed byte and then clk_seq_low/node as a series of unsigned bytes (not a single integer), however the UUIDType comparison just treats it as a single signed long. It looks like this behavior was preserved during the performance refactor in CASSANDRA-8730. As for the correct ordering, although the spec does say to treat the clock sequence as an unsigned integer, it doesn't really convey any meaning as it is assigned from a random number generator. (FWIW, The RFC also recommends lexical ordering by time_low, time_mid, then time_hi which seemed odd to me but is confirmed by reading the uuid_compare function in Appendix A). For both TimeUUID / UUID the primary ordering is by 100-nsec resolution timestamp first which is what we want when dealing with timestamps, then deterministically on the node/clockseq (although unfortunately in a different order depending on TimeUUID/UUID). Changing the comparison function affects the order data is stored on disk and would require users to perform some kind of migration (automated or manual), and although it would be nice to make them both the same, and perhaps change the clockseq to be treated as an unsigned int to reduce surprise I don't think the value is there to make any change. Just in case I've missed the point, would you mind answering a couple of questions # How did you notice this issue, was it with synthetic tests or through production usage? # Why does it matter to you? Are you trying to integrate with an external system that expects the TimeUUIDs to be ordered in a specific way? [1] node id for Cassandra is generated from a hash of local IP addresses as the comments claim the MAC address was not easily accessible from Java. I'm not sure if the order of the hashed addresses is stable so it's possible the same node with same networking could come up with a different node. https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/utils/UUIDGen.java#L349 The method has been changed for trunk, but assuming you're not running it. [2] Initializing clock seq https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/utils/UUIDGen.java#L295 [3] UUID comparison https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/db/marshal/UUIDType.java#L90 [4] TimeUUID comparison https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/db/marshal/TimeUUIDType.java#L44 > Wrong ordering for timeuuid fields > ---------------------------------- > > Key: CASSANDRA-15064 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15064 > Project: Cassandra > Issue Type: Bug > Reporter: Andreas Andersen > Assignee: Jon Meredith > Priority: Normal > Attachments: example.cql > > > Hi! > We're seeing some strange behavior for the ordering of timeuuid fields. They > seem to be sorted in the wrong order when the clock_seq_low field in a > timeuuid goes from 7f to 80. Consider the following example: > {noformat} > cqlsh:test> show version; > [cqlsh 5.0.1 | Cassandra 3.11.4 | CQL spec 3.4.4 | Native protocol v4] > cqlsh:test> CREATE TABLE t ( > ... partition int, > ... t timeuuid, > ... i int, > ... > ... PRIMARY KEY(partition, t) > ... ) > ... WITH CLUSTERING ORDER BY(t ASC); > cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, > 84e2c963-4ef9-11e9-b57e-f0def1d0755e, 1); > cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, > 84e2c963-4ef9-11e9-b57f-f0def1d0755e, 2); > cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, > 84e2c963-4ef9-11e9-b580-f0def1d0755e, 3); > cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, > 84e2c963-4ef9-11e9-b581-f0def1d0755e, 4); > cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, > 84e2c963-4ef9-11e9-b582-f0def1d0755e, 5); > cqlsh:test> SELECT * FROM t WHERE partition = 1 ORDER BY t ASC; > > partition | t | i > -----------+--------------------------------------+--- > 1 | 84e2c963-4ef9-11e9-b580-f0def1d0755e | 3 > 1 | 84e2c963-4ef9-11e9-b581-f0def1d0755e | 4 > 1 | 84e2c963-4ef9-11e9-b582-f0def1d0755e | 5 > 1 | 84e2c963-4ef9-11e9-b57e-f0def1d0755e | 1 > 1 | 84e2c963-4ef9-11e9-b57f-f0def1d0755e | 2 > > (5 rows) > cqlsh:test> > {noformat} > The expected behavior is that the rows are returned in the same order as they > were inserted (we inserted them with their clustering key in an ascending > order). Instead, the order "wraps" in the middle. > This issue only arises when the 9th octet (clock_seq_low) in the uuid goes > from 7f to 80. A guess would be that the comparison is implemented as a > signed integer instead of an unsigned integer, as 0x7f = 127 and 0x80 = -128. > According to the RFC, the field should be treated as an unsigned integer: > [https://tools.ietf.org/html/rfc4122#section-4.1.2] > Changing the field from a timeuuid to a uuid gives the expected correct > behavior: > {noformat} > cqlsh:test> CREATE TABLE t ( > ... partition int, > ... t uuid, > ... i int, > ... > ... PRIMARY KEY(partition, t) > ... ) > ... WITH CLUSTERING ORDER BY(t ASC); > cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, > 84e2c963-4ef9-11e9-b57e-f0def1d0755e, 1); > cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, > 84e2c963-4ef9-11e9-b57f-f0def1d0755e, 2); > cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, > 84e2c963-4ef9-11e9-b580-f0def1d0755e, 3); > cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, > 84e2c963-4ef9-11e9-b581-f0def1d0755e, 4); > cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, > 84e2c963-4ef9-11e9-b582-f0def1d0755e, 5); > cqlsh:test> SELECT * FROM t WHERE partition = 1 ORDER BY t ASC; > > partition | t | i > -----------+--------------------------------------+--- > 1 | 84e2c963-4ef9-11e9-b57e-f0def1d0755e | 1 > 1 | 84e2c963-4ef9-11e9-b57f-f0def1d0755e | 2 > 1 | 84e2c963-4ef9-11e9-b580-f0def1d0755e | 3 > 1 | 84e2c963-4ef9-11e9-b581-f0def1d0755e | 4 > 1 | 84e2c963-4ef9-11e9-b582-f0def1d0755e | 5 > > (5 rows) > cqlsh:test>{noformat} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org